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Abstract 

Capacity formulas and random-coding exponents are derived for a generalized family of 
Gel'fand-Pinskcr coding problems. These exponents yield asymptotic upper bounds on the 
achievable log probability of error. In our model, information is to be reliably transmitted 
through a noisy channel with finite input and output alphabets and random state sequence, 
and the channel is selected by a hypothetical adversary. Partial information about the state 
sequence is available to the encoder, adversary, and decoder. The design of the transmitter is 
subject to a cost constraint. Two families of channels are considered: 1) compound discrete 
memoryless channels (CDMC), and 2) channels with arbitrary memory, subject to an additive 
cost constraint, or more generally to a hard constraint on the conditional type of the channel 
output given the input. Both problems are closely connected. The random-coding exponent is 
achieved using a stacked binning scheme and a maximum penalized mutual information decoder, 
which may be thought of as an empirical generalized Maximum a Posteriori decoder. For 
channels with arbitrary memory, the random-coding exponents are larger than their CDMC 
counterparts. Applications of this study include watermarking, data hiding, communication in 
presence of partially known interferers, and problems such as broadcast channels, all of which 
involve the fundamental idea of binning. 

Index terms: channel coding with side information, error exponents, arbitrarily varying chan- 
nels, universal coding and decoding, randomized codes, MAP decoding, random binning, capacity, 
reliability function, method of types, watermarking, data hiding, broadcast channels. 
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1 Introduction 



In 1980, Gel'fand and Pinsker studied the problem of coding for a discrete memoryless channel 
(DMC) p(y\x, s) with random states S that are observed by the encoder but not by the decoder [1]. 
They derived the capacity of this channel and showed it is achievable by a random binning scheme 
and a joint-typicality decoder. Applications of their work include computer memories with defects 
[2], writing on dirty paper, and communication in presence of a known interference |3 01 13 |6]. 
Duality with source coding problems with side information was explored in [2 [8j E] • In the late 
1990's, it was discovered that the problems of embedding and hiding information in cover signals are 
closely related to the Gel'fand-Pinsker problem: the cover signal plays the role of the state sequence 
in the Gel'fand-Pinsker problem [10l Ell E2]. Capacity expressions were derived under expected 
distortion constraints for the transmitter and a memoryless adversary [12] . One difference between 
the basic Gel'fand-Pinsker problem and the various formulations of data-hiding and watermarking 
problems resides in the amount of side information available to the encoder, channel designer 
(adversary), and decoder. A unified framework for studying such problems is considered in this 
paper. The encoder, adversary and decoder have access to degraded versions s e , s a ,s d , respectively, 
of a state sequence s. Capacity is obtained as the solution to a mutual-information game: 

C= sup min [I(U; YS d ) - I(U; S e )], 

where U is an auxiliary random variable, and the sup and min are subject to appropriate constraints. 

In problems such as data hiding, the assumption of a fixed channel is untenable when the channel 
is under partial control of an adversary. This motivated the game-theoretic approach of [12] , where 
the worst channel in a class of memoryless channels was derived, and capacity is the solution to a 
maxmin mutual-information game. This game-theoretic approach was recently extended by Cohen 
and Lapidoth [13] and Somekh-Baruch and Merhav [14L I15j , who considered a class of channels with 
arbitrary memory, subject to almost-sure distortion constraints. In the special case of private data 
hiding, in which the cover signal is known to both the encoder and the decoder, Somekh-Baruch 
and Merhav also derived random-coding and sphere-packing exponents [13] ■ Binning is not needed 
in this scenario. The channel model of [13\ [HI [T5] is different from but reminiscent of the classical 
memoryless arbitrary varying channel (AVC) [16\ [T71 [IB] which is often used to analyze jamming 
problems. In the classical AVC model, no side information is available to the encoder or decoder. 
Error exponents for this problem were derived by Ericson [19] and Hughes and Thomas [20]. The 
capacity of the AVC with side information at the encoder was derived by Ahlswede [21] . 

The coding problems considered in this paper are motivated by data hiding applications in which 
the decoder has partiaQ or no knowledge of the cover signal. In all cases capacity is achievable by 
random-binning schemes. Roughly speaking, the encoder designs a codebook for the auxiliary U. 
The selected sequence U plays the role of input to a fictitious channel and conveys information about 
both the encoder's state sequence S e and the message M to the decoder. Finding the best error 
exponents for such schemes is challenging. Initial attempts in this direction for the Gel'fand-Pinsker 
DMC have been reported by Haroutunian et al. [22, 23J, but errors were discovered later |24[ 125]. 
Very recently, random-coding exponents have been independently obtained by Haroutunian and 

x For instance, the decoder may have access to a noisy, compressed version of the original cover signal. 
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Tonoyan ]26j and Somekh-Baruch and Merhav [27]. Their results and ours [28] were presented at 
the 2004 ISIT conference. 

The random-coding exponents we have derived cannot be achieved by standard binning schemes 
and standard maximum mutual information (MMI) decoders [16\ I18j . Instead we use a stack 
of variable-size codeword-arrays indexed by the type of the encoder's state sequence S e . The 
appropriate decoder is a maximum penalized mutual information (MPMI) decoder, where the 
penalty is a function of the encoder's state sequence type. The MPMI decoder may be thought of 
as an empirical generalized MAP decoder, just like the conventional MMI decoder may be thought 
of as an empirical MAP decoder. 

This paper is organized as follows. A statement of the problem is given in Sec. [21 together with 
basic definitions. Our main results are stated in Sec. [3]in the form of four theorems. An application 
to binary alphabets under Hamming cost constraints for the transmitter and adversary is given in 
Sec. [H Proofs of the theorems appear in Sees. [5] — [HJ All derivations are based on the method of 
types [29]. The paper concludes with a discussion in Sec. [Hand appendices. 

1.1 Notation 

We use uppercase letters for random variables, lowercase letters for individual values, and boldface 
fonts for sequences. The p.m.f. of a random variable X 6 X is denoted by px = {px(x), x 6 X}, 
and the probability of a set f2 under px is denoted by Px(Q)- Entropy of a random variable X 
is denoted by H(X), and mutual information between two random variables X and Y is denoted 
by I{X; Y) = H(X) — H(X\Y), or by Ixy(pxy) when the dependency on pxy should be explicit; 
similarly we sometimes use the notation Ixy\z(pxyz)- The Kullback-Leibler divergence between 
two p.m.f. 's p and q is denoted by D(p\\q). We denote by D(p y \x\\qy\x\px) = D(p Y \ xPx\\qy\xPx) 
the conditional Kullback-Leibler divergence of py\x an d Qy\x with respect to px- The base-2 
logarithm of x is denoted by logic, and the natural logarithm is denoted by hvx. 

Following the notation in Csiszar and Korner [16] , let p x denote the type of a sequence x 6 X N 
(p x is an empirical p.m.f. over X) and T x the type class associated with p x , i.e., the set of all 
sequences of type p x . Likewise, we define the joint type p xy of a pair of sequences (x, y) € X N x y N 
(a p.m.f. over X x y) and T xy the type class associated with p xy , i.e., the set of all sequences of type 
Pxy We define the conditional type p y \ x of a pair of sequences (x, y) as P * y for all x £ X such 
that p x (x) > 0. The conditional type class T y | x is the set of all sequences y such that (x, y) £ T xy . 
We denote by H (x) the entropy of the p.m.f. p x and by /(x; y) the mutual information for the 
joint p.m.f. p xy . Recall that 

(TV + 1)H*I 2 NH & < |T X | < 2 NH ^ (1.1) 

and 

(iV + 1)H*I W 2 NH ^ < |T y | x | < 2 NH ^ X \ (1.2) 

We let Vx arid represent the set of all p.m.f. 's and empirical p.m.f. 's, respectively, for 
a random variable X. Likewise, Vy\x an d ^P^x denote the set of all conditional p.m.f. 's and all 
empirical conditional p.m.f.'s, respectively, for a random variable Y given X. The notations f(N) <C 
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g(N), f(N) = 0(g{N)), and f(N) » g(N) indicate that lim 



#1 



is zero, finite but nonzero, 



and infinite, respectively. The shorthands f(N) = g(N), f(N) < g(N) and f(N) > g(N) denote 



equality and inequality on the exponential scale: limjv- 



.00 TV 



In 



f(N) 



0, limjv- 



.00 TV 



In 



< 0, 



and lini/v-too m ^/vy > 0, respectively. We let l{x € f2} denote the indicator function of a set f2, 

and U(f2) denote the uniform p.m.f. over a finite set Q. We define \t\ + = max(0,i), exp 2 (£) = 2 t , 
and h(t) = — tlogt — (1 — t)log(l — t) (the binary entropy function). We adopt the notational 
convention that the minimum of a function over an empty set is +00. 



2 Statement of the Problem 



Our generic problem of communication with side information at the encoder and decoder is dia- 
grammed in Fig. [H There three versions S e , S a , and S d of a state sequence are available to the 
encoder, adversary and decoder, respectively. We use the short hand S to denote the joint state 
sequence (S e ,S a ,S d ). This sequence consists of independent and identically distributed (i.i.d.) 
samples drawn from a p.m.f. p(s e , s a , s d ) . The individual sequences S 6 , S a , S d are available non- 
causally to the encoder, adversary and decoder, respectively. The adversary's channel is of the 
form PY|xs a (y| x ) s °)- This includes the problems listed in Table 1 as special cases. The alphabets 
S e , S a , S d , X and y are finite. 



P(s e ,s a ,s d ) 



Mai, 



2 NR } 



Message 



Randomized Code C 



Encoder 




Channel 




Decoder 


/n 


► 

X 


p(y\x,s a ) 


Y 





A 

M 



Figure 1: Communication with side information at the encoder and decoder. Cost constraints are 
imposed on the encoder and channel. 



Problem 


S a 


S d 


Binning? 


Gel'fand-Pinsker pQ 


S e 





yes 


Public Watermarking |12l [T5] 








yes 


Semiblind Watermarking |12j 





S d / S e 


yes 


Cover-Chiang [7] 


(S e ,S d ) 


s d 


yes 


Private Watermarking [12^ I14j 





s e 


no 



Table 1: Relation between S e , S a , and S d for various coding problems with side information. 
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A message M is to be transmitted to a decoder; M is uniformly distributed over the message 
set M. The transmitter produces a sequence X = /jv(S e , M). The adversary passes X through the 
channel £>Y|xs a (y l x > sa ) to produce corrupted data Y. The decoder does not know Py|xs q selected 
by the adversary and has access to S d . The decoder produces an estimate M = 5at(Y, S d ) G M. of 
the transmitted messagell 

We allow the encoder /decoder pair (/jv, 9n) to be randomized, i.e., the choice of (/jv,0jv) is a 
function of a random variable known to the encoder and decoder but not to the adversary. This 
random variable is independent of all other random variables and plays the role of a secret key. 
The randomized code will be denoted by (Fn, Gn). 

To summarize, the random variables M, Fn, Gn, S e , S a , S d , X and Y have joint p.m.f. 



PM (m)p FNGN (/jv> 9n) 



' N 

n 



PS e S a S d ( s i i s ii s i) 



I{x = /7v(s e ,m)}p Y |xs-(y|x,s a ). 



2.1 Constrained Side-Information Codes 

A cost function T : S e x X — > R + is defined to quantify the cost T(s e ,x) of transmitting symbol 
x when the channel state at the encoder is s e . This definition is extended to N- vectors using 
r Ar (s e ,x) = jj X/i=i , Xj). In information embedding applications, T is a distortion function 
measuring the distortion between host signal and marked signal. 

We now define a class of codes satisfying maximum-cost constraints (Def. I2.1|) and a class of 
codes satisfying average-cost constraints (Def. 12.21) . The latter class is of course larger than the 
former. We also define a class of randomly-modulated (RM) codes (Def. 12.3]) . adopting terminology 
from [20]. 

Def. l2.2l is analogous to the definition of a length- N information hiding code in [12]. The common 
source of randomness between encoder and decoder appears via the distribution Pf n g n {Jn-, 9n) 
whereas in [12] it appears via a cryptographic key sequence k with finite entropy rate. 

Definition 2.1 A length- N , rate-R, randomized code with side information and maximum cost 

D\ is a triple (M,Fn,Gn), where 

• A4 is the message set of cardinality \M\ = [2^^]; 

• (F N ,G N ) has joint distribution p FNGN (f N ,g N ); 

• /at : (S e ) N x J\A — > X N is the encoder mapping the state sequence s e and message m to the 
transmitted sequence x = /jy(s e ,m). The mapping is subject to the cost constraint 

T N (s e ,f N (s e ,m))<Di almost surely (ps^Pf n Pm); (2.1) 

2 At first sight the problem setup could be simplified by eliminating the variable S a and considering the "average 
channel" Py|xs= (y| x , s e ) — X^ s « Py|xs o (y| x , s£ *)Ps a |S c ( sll | se )- We do not follow this approach because Ps<*\S' ls fixed 
and Py|xs o is optimized by the adversary; hence these p.m.f.'s appear separately in the problem formulation and its 
solution. A similar comment applies to S d . 
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• 9n '■ y N x (<5 ) — > Af U {e} zs i/ie decoder mapping the received sequence y and channel 
state sequence s d to a decoded message rh = gN(y,s d ). The decision rh = e is a declaration 
of error. 

Definition 2.2 A length- N , rate-R, randomized code with side information and expected cost 

D\ is a triple (A4,Fn,Gn) which satisfies the same conditions as in Def. \2.1l except that 112.1]) is 
replaced with the weaker constraint 

£l$(» e ) E Tj^»(a'J N {ff,m)) < D x . (2.2) 

s e f N m&M ' ' 



Definition 2.3 A randomly modulated (RM) code with side information is a randomized code 
defined via permutations of a prototype (/n^Qn)- Such codes are of the form 

x = /^(s e ,m) = 7r _1 /Ar(7rs e , m) 
^(y,s d ) 4 ffJV (7ry,7rs d ) 

where ir is chosen uniformly from the set of all N\ permutations and is not revealed to the adversary. 
The sequence 7rx is obtained by applying tt to the elements of x. 

2.2 Constrained Attack Channels 

Next we define a class A of DMC's (Def. 12. 4p and a corresponding class "Py|xs q [-4] °f channels with 
arbitrary memory (CAM) in which the conditional type of y given (x, s a ) is constrained (Def. [275]) . 

Definition 2.4 A compound DMC (CDMC) class A is any compact (under L\ norm) subset of 
T > Y\xs a - 

For CDMC's, we have PY|xs a (y| x i s<1 ) = W^=\PY\xs a {yi\ x i^ s t)i where py\xs a £ «4- The set A is 
defined according to the application. 

1. In the case of a known channel [1J, A is a singleton. 

2. In information hiding problems [12] , A is the class of DMC's that introduce expected distor- 
tion between X and Y at most equal to Z?2 : 

Pxs«(x, s a )PY\xs*(y\x, s a )d(x,y) < D 2 , (2.3) 

s a ,x,y 

where d : X x y — > M + is a distortion function. A can also be defined to be a subset of the 
above class. 

3. In some applications, A could be defined via multiple cost constraints. 
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Given a p.m.f. pxus e , we denote by V Y s a s d \xus £ [<A,PxuS e ] the class of DMC's PYS a s d \xus e whose 
conditional marginal py\xs a is m the CDMC class A. 

Definition 2.5 The CAM class "PY|xs a [-4] is the set of channels such that for any channel input 
(x, s a ) and output y, the conditional type p y | xs a belongs to ^■f]T > Y\XS a w ^ probability 1: 

Pr[p ylxsa eA] = l. (2.4) 



If A is defined via the distortion constraint (|2,3p . let d N (x, y) = i ]Ci^i d(xi,yi). Condition (|2.4p 
may then be rewritten as 

Pr[d"(x,y)<£> 2 ] = l, (2.5) 

i.e., feasible channels have total distortion bounded by ND2 and arbitrary memory^ Comparing 
the CDMC class A and the CAM class 7-Y|xs a [A], we see that 1) for (X, S a ,Y) in any given type 
class, the conditional p.m.f. of Y given (X, S a ) is uniform in the CDMC case but not necessarily 
so in the CAM case, and 2) while conditional types p y \ xs a £ A may have exponentially vanishing 
probability under the CDMC model, such types are prohibited in the CAM case. One may expect 
that both factors have an effect on capacity and random-coding exponents. As we shall see, only 
the latter factor does have an effect on random-coding exponents. 

The relation between the CAM class "Pyixs* 1 in (|2.5p and the classical AVC model [16] is 
detailed in Appendix |Al The class (12. 5p is not a special case of the classical AVC model because 
arbitrary memory is allowed. 

We also introduce the following class of attack channels, which turn out to be the worst CAM 
channels for the problems considered in this paper. 

Definition 2.6 An attack channel j>Y|xs a uniform over single conditional types is defined via 

a mapping A : V^ga ~^ ^Y\xs a suc ^ ^ ia ^ with probability 1, the channel output y has conditional 
type p y \ xs a = A(p xs a). Moreover, Y is uniformly distributed over the corresponding conditional type 
class. 

Lastly, given a type p X us e , we denote by T^YS a s d \xu s e ^ 7 Pxus e ] the class of conditional types 
Pys a s d |xus e suc h that p y \ xs a is in the CAM class 'PyIxs® \<A\ ■ 



2.3 Probability of Error 

The average probability of error for a deterministic code (/jv>5jv) when channel £>Y|xs a i s i n effect 
is given by 

-Pe(/iV,SW,PY|XS a ) 

= Pr(M ^ M) 

= uLE E E PY|xs«(y|x,s a )I{x = / JV (s e ,m)}^ e5a5d (s e ,s a ,s d ). (2.6) 



3 The case of channels with arbitrary memory subject to expected-distortion constraints admits a trivial solution: 
the adversary "obliterates" X with a fixed, nonzero probability that depends on D2 but not on N, and therefore no 



reliable communication is possible in the sense of Def. 12.71 below. 
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For a randomized code the expression above is averaged with respect to Pf n g n (In, 9n)) this average 
is denoted by P e {F^, Gjv>PY|xs a )- The niinmax probability of error for the class of randomized 
codes and the class of attack channels considered is given by 

P e,N = min max PF N G N (.fN,9N)Pe(fN,9N,PY\XS°)- ( 2 -<0 

PF N G N PY|XS" /— ' 
JN,gN 



Definition 2.7 A rate R is said to be achievable if P* N — > as N — > oo. 



Definition 2.8 The capacity C{D\,A) is the supremum of all achievable rates. 



Definition 2.9 The reliability function of the class of attack channels considered is 



E(R) = liminf 

JV-f-oo 



(2. 



There are four combinations of maximum/expected cost constraints for the transmitter and 
CDMC/CAM designs for the adversary (four flavors of the generalized Gel'fand-Pinsker problem), 
and a question is whether same capacity and error exponents will be obtained in all four cases. We 
now define transmit channels, which play a crucial role in deriving capacity and error-exponents. 

Definition 2.10 Given alphabets X, IA andS e , a transmit channel Pxu\S e * s a conditional p.m. f. 
that satisfies the following distortion constraint on the conditional marginal Px\s e ■' 

Yl Pxu\s-(x,u\s e )ps4s e W(s e ,x) < D x . 



Given an alphabet IA of cardinality L, we denote by Vxu\S e (L, D\) the set of feasible transmit 
channels. Note that transmit channels have been termed covert channels [12] and watermarking 
channels [144115] in the context of information hiding. In those papers, the channel py\x was termed 
attack channel; we retain this terminology for py\xs a in this paper. 



2.4 Preliminaries 



Consider a sextuple of random variables (S e , S a , S d , U, X, Y) with joint p.m.f. Ps E s a s d UXYi where 
U is an auxiliary random variable taking values in IA = {1,2, • • • ,L}. The following difference of 
mutual informations plays a fundamental role in capacity analysis [1] — [15] of channels with side 
information. It plays a central role in the analysis of error exponents as well: 

Mps^s^uxy) = m YS d ) - I(U; S e ). (2.9) 

Note that Jl depends on Ps^s a S d UXY onr y Yl& the marginal Pus e S d Yi moreover, the cardinality L 
of the alphabet IA has been made explicit in the definition (12. 9ft . 
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Channel capacity for the problems studied in pQ — [15] is given by 

C(Di,A) = lim max min Jl{p s - s^s d Pxu\s<= PY\xs a ) ( 2 - 10 ) 

L^oo Pxu\S e PY\xs a 

where restrictions are imposed on the joint distribution of (S e ,S a ,S d ) (including the absence of 
some of these variables, see Table 1), and the maximization over Pxu\S e an d minimization over 
PY\xs a are possibly subject to cost constraints. 

The cardinality of the alphabet U may be unbounded |15[ p. 514] @ , hence the infinite range 
for L in (pTTOj) . To evaluate (pTTOj) in the case S a = 0, Moulin and O'Sullivan [12] claimed that 
one can choose L = \S e \ \X\ + 1 without loss of optimality. The proof is based on Caratheodory's 
theorem, as suggested in [T]. However the proof in [12] applies only to the fixed-channel case EL 

The use of alphabets with unbounded cardinality introduces some technical subtleties. The 
following two lemmas are straightforward but will be useful. The proof of the first one is based on 
the nested nature of the feasible sets Vxu\S e (L, D\), 1 < L < oo. 

Lemma 2.1 Let U = {1, 2, • • • , L} and ipi a functional defined over Vxu\s e (Li Di)- Then 

i/j* l = max i/>l(Pxu\S') (2-11) 

PXU\S e ^XU\S e ( L '- D l) 

is a nondecreasing function of L. 

Proof. We need to prove that ip^ < V'l+i f° r an y Let P*xu\s e acn i eve the maximum defining i\)* L 
and define the extended p.m.f. Pxu\s e over {!> 2, • • • , L + 1} as follows: 

ex t ie\ \ PxTH^e\X,U\S ) U = 1, 2, • • • , L 

Pxu\s^ x Ms e ) = | Q XUlS u = L + l ' 

Since p e xu\s e anc ^ P*xu\s e nave the same conditional marginal Px\s e i we have P e xu\s e e 1~'xu\S e {I J + 
1, Di), and 

^Zrfl(Pj?[/|S<0 = ^(Pxt/|S«)- 

Therefore 

^L=l/fL+l{pf U \ S e)<1pl + l- 

□ 

Lemma 2.2 Given three compact setsV, Q, 1Z and a functional <f> : Vx QxTZ —* M, let (p*,q*,r*) 
achieve the min max min in 

min max min 6(p, q, r). (2.12) 

p£V q£Q rail 

It is assumed that 4> is continuous in an L\ neighborhood of (p* , q* , r*). Then, given three sequences 
of subsets V n , QniTZn, 1 < n < oo respectively dense in V , Q and 1Z under the L\ norm, we have 
the following property: 

min max min Sip, q, r) = lim min max min 6(p, q, r). (2.13) 

p&T q&Q r&ll n-^oo p eP n q£Q n r&l n 



4 The capacity formula in [151 Corollary 1, p. 514] was obtained under the restriction of constant composition 
codes. 

5t 



'Equation (A7) in the proof of [121 Prop. 4.1(iv)] is associated with a fixed DMC. 
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Proof: Denote the left side of (|2. 13|) by eft*** and the argument of the limit in the right side by A n . 
We have A n < A n < A n where 

A„ = min max min<A(p, q, r), 
A„ = min max min 6(p,q,r). 

Since the maximization (resp. minimizations) defining A n (resp. A n ) is over a dense subset of Q 
(resp. P x 71), we have 

lim A n = lim A n = <p*** . 

n— >oo n— >oo 

Hence lim n ^oo A n = 4>***. □ 

Finally, recall that the Kullback-Leibler divergence [16] and related functionals (including mu- 
tual information I(jpxy) = D(pxy WpxPy) and Ji functionals) are continuous with respect to L\ 
norm. For instance, for any L, any p.m.f.'s p and p' with finite values of Jl, and any e > 0, there 
exists S such that 

\\p-p'\\<5 => \J L (p) - J L {p')\ < e, 
where the norm on p — p' is the L\ norm. 



3 Main Results 

The main tool used to prove the coding theorems in this paper is the method of types [29]. Our 
random-coding schemes are binning schemes in which the auxiliary random variable U is input to 
a fictitious channel. 

In all derivations, optimal types for sextuples (s e , s a , s d , u, x, y) are obtained as solutions to 
maxmin problems. Two key facts used to prove the theorems are: 1) the number of conditional 
types is polynomial in N, and 2) in the CAM case, the worst attacks are uniform over conditional 
types, as in Somekh-Baruch and Merhav's watermarking capacity game [15j . Proof of the theorems 
appears in Sees. [5]— [H Related, known results for CDMC's without side information are summarized 
in Appendix [51 

The expression (I2.10p . restated below in a slightly different form, turns out to be a capacity 
expression for the problems considered in this paper (Theorems 13.61 and 13. 7|) : 

C = C{D l ,A) = lim C L (3.1) 

where 

C L = max min JL{p s -s a s d Pxu\s-PY\xs^)- (3-2) 

PXU\Se£-'PxU\Se\ L > D l)PY\XS a £A 

By application of Lemma l2.lt the sequence Cl is nondecreasing in L. 

In the special case of degenerate Ps<^s d ( no s ^ e information at the encoder and decoder), it is 
known that the maximum above is achieved by U = X, and capacity reduces to the standard formula 
C = max px min py|xsa Ixy{px Py\xs°- Ps a )- H S e = S d = S and S a = (private watermarking), 
the optimal choice is again U = X, and C = max pjf|s min Py|x Ixy\s{Px\sPy\xPs)- 
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3.1 Random-Coding Exponents for CDMC Model 
Lemma 3.1 The function 

E^ MC (R) = min max min min 

PSeEVse VXU\S e ^'PxU\S e ( L , D i) P Y S a S d \XU S e&V Y S a S d \XU S s PY\XS a &^ 

D(pse PxU\S e PYS a S d \XUS e I \Ps e s a s d Pxu\se PY\XS a ) 
+ \Jl{PS? PXU\S<= PYS-S d \XUS-) ~ R \ + ( 3 - 3 ) 
satisfies the following properties: 

(i) E^ MC (R) = if and only if R> C L ; 

(ii) E?™ C (R)<\C L -R\+; 

(iii) E^ MC (R) < Ef ™f(R) (nondecreasing in L). 



Proof. 

(i) Clearly E^ MC (R) > 0, with equality if and only if the following three conditions are met: 



1 . the minimizing ps? in f|3.3[) is equal to ps* , 

2. the minimizing P~YS a S d \XUS e 111 Q3-3P is equal to py\xs a Ps a S d \S e -> an d 

3. R > C L . 

(ii) This upper bound on (13. 3p is obtained by fixing fis^ = ps e and PYS a S d \xus e = PY\xs a Ps a S d \S e - 
The upper bound is achieved if the minimizing ps? and PYS a S d \xus e i n (|3-3|) are equal to ps? and 
PY\xs*Ps a s d \s^ respectively. 

(iii) This is a direct consequence of Lemma 12.11 

Theorem 3.2 For the CDMC case (Def. \2J$ with maximum- cost constraint \2. 1\) or expected-cost 
constraint \2.2\) on the transmitter, the reliability function is lower-bounded by the random- coding 
error exponent 

E? BMC (R) = lim E™ MC (R). (3.4) 

Moreover, E? DMC (R) = if and only ifR>C. 

For any value of L, the random-coding error exponent E^ MC (R) of (13. 3p is achieved by a 
binning code with conditionally constant composition and the MPMI decoder. We now present a 
brief overview of this scheme and an interpretation for the MPMI decoder. 
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State Sequence type X 




~NR "~ 



Figure 2: Representation of binning scheme as a stack of arrays indexed by the encoder's state 
sequence type A. The arrays have 2^ columns and 2^*) rows, and the random-coding exponent 
is optimized by choosing p(X) = Ijj S e(X) + e. 



For notational simplicity, here we use the shorthand A to denote the type of the encoder's state 
sequence (recall there is a polynomial number of such types). Let U = {1,2,- •• , L}, where the 
value of L is arbitrary. Referring to Fig. [21 to each value of A corresponds an array 

C(A) = {u im l\ 1 < I < 2 N ^ X \ l<m< \M\} (3.5) 

of codewords, drawn uniformly from some optimized type class. We refer to p(A) as the depth pa- 
rameter of the array C(A). The codebook C is the union of these arrays. Each array has exponential 
size, but the number of arrays is polynomial in N. 

The array depth parameter p(X) is designed to optimally balance the probability of encoding 
error and the probability of decoding error, conditioned on the encoder's state sequence type A. 
Upon seeing m and s e , the encoder evaluates the type A of s e and seeks a codeword u' m ' A that 
belongs to some optimized conditional type class T u | s e. Let i^g e (A) denote the empirical mutual 
information associated with T u | s e. An encoding error arises when no codeword can be found in the 
conditional type class T u | s e. The probability of that event does not vanish when p(X) < Ifjge{X) but 
vanishes at a double-exponential rate when p(X) > Iu S e(X) + e. The probability of decoding error 
increases exponentially with p(A). Therefore the optimal tradeoff is given by p(X) = ^yse(A) + e. 

Instead of choosing p(A) as a function of A, a suboptimal design would be to fix the value of 
p and draw all the codewords uniformly and i.i.d. from a single type class T u . The scheme would 
then be more akin to the original Gel'fand-Pinsker binning scheme, which uses a single array of 
codewords (drawn i.i.d. from a p.m.f. pjj). When p is fixed, the fact that a polynomial number of 
equal-size arrays is used rather than a single array is inconsequential as far as error exponents are 
concerned. 

The MPMI decoder is matched to the selected random binning scheme. Given (y, s d ), the 
MPMI decoder seeks the codeword in C = Ua^(^) t na t achieves the maximum of the penalized 
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empirical mutual information criterion 

m M PMl = argmax m max[/(u /m|A ; ys d ) - p(X)]. (3.6) 



As the proof of Theorem 13.21 indicates, the penalty p(X) is optimal among all functions of A; the 
optimal penalty is thus matched to the array depth parameter. 

The MPMI decoder may be thought of as an empirical generalized MAP decoder. Indeed, all 
messages are equiprobable, and the encoding procedure ensures that for any given type A, all bins are 
equiprobable as well. The probability of the pair (m, I) is thus equal to 1/|C(A)| for all I, m. Hence, 
given C(A), the a priori distribution of the codewords is uniform: p(u lm \ x ) = 1/|C(A)| = 2~ N ^- R+P ^ . 
Therefore 

p(X) = -R-^logp(u lm \ x ) VI, ro. (3.7) 

We may write 

where p denotes an empirical p.m.f. or empirical conditional p.m.f.. Substituting (|3.7p and (|3.8p 
into (|3.6p . we obtain 



m-MPMl = argmax m max 

/.A 



1 , p(y,s<V m IA) i 



= argmax m maxp(u' m l A |ys d ). (3.9) 

/, A 

This may be thought of as an empirical version of the generalized MAP decoder 



"^GMAP = argmax m maxp(u mI |ys° 

/.A 



argmax™ max 



(3.10) 



which requires knowledge of the channel from u to (y, s d ). We do not know whether the GMAP 
decoder is as good (on the exponential scale) as the optimal MAP decoder 

"iMAP = argmax m p(m|y,s d ) = argmax m E ; A | ygd p(u im|A [y, s d ) (3.11) 

which averages out the nuisance parameters (I, A) and is more difficult to analyze. 

The MPMI decoder is matched to the encoding scheme in that the same function p(A) is used 
as the depth parameter of the array C(A) and as the penalty in the decoding function. As the proof 
of Theorem 13.21 indicates, any other choice of the penalty function would in general result in a lower 
error exponent. This is not surprising in view of the above generalized MAP interpretation. 

3.2 Random-Coding Exponents for CAM Model 

We now turn our attention to the CAM channel model. First we state the following lemma, which 
is analogous to Lemma l3~Tl 
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Lemma 3.3 The function 

E°£ U (R) 4 min max min 

Pse&Vse Vxu\s e £-1 :, xu\s e { L i D i) P Y S a S d \XU S e 67 V s a s d \xu s e ^' p xu\s e P s e ] 

D (PS e S a S d I \Ps e S a S d ) + lY;US e S d \XS a (Ps e PXU\S? P~YS a S d \XUS e ) 
+ \ J UPs^PxU\S £ PYS a S d \XUS-) ~ R \ + ( 3 - 12 ) 
satisfies the following properties: 

(i) E^ M (R) = if and only if R> C L ; 

(ii) E°** A (R)<\C L -R\+; 

(iii) E^ M (R) < E°^(R) (monotonicity in L). 

Theorem 3.4 For the CAM case (Def. \2. 5|) with maximum- cost constraint \2. 1\) or expected- cost 
constraint \2.2\) on the transmitter, the reliability function is lower-bounded by the random- coding 
error exponent 

E? AM {R) = lim E^ M (R) (3.13) 

L^oo ' 

Moreover, E? AM (R) = if and only ifR>C. 

For any value of L, the random-coding error exponent (|3.12p is achieved by a randomly- 
modulated code with conditionally constant composition, stacked binning, and a MPMI decoder. 
The worst attack channel is uniform over single conditional types (Def. [2U|) . 



3.3 Comparison of Random-Coding Exponents for CDMC and CAM Models 

For both the CDMC and the CAM models, it should be noted that: 

1. the worst type classes T s e, T ys a s d| xus e, and best type class T xu | s e (in an appropriate min max 
min sense) determine the error exponents; 

2. the order of the min, max and min is determined by the knowledge available to the encoder. 
The encoder knows s e and can optimize T xu j s e, but has no control over T ysas ,i| xuse ; 

3. the straight-line part of E r {R) results from the union bound; 

4. random codes are generally suboptimal at low rates. 

Theorems 13.21 and 13.41 imply the following relationship between error exponents in the CDMC 
and CAM cases. 

Corollary 3.5 E? DMC (R) < E? AM (R) <\C - R\ + for all R. 
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Proof. Fix L. Using the relation 

I(Y; US e S d \XS a ) = D(p YX us e s a s d \\PY\xs a Pxus e s a s d )> 

we write 

I Y;US?S d \XS a {"PS? PXU\S £ PYS°-S d \XUSe) = D (PS" PXU\S e PYS a S d \XUS" I IPs PXU\S^ PY\XS a ) 
where we have defined the marginal conditional p.m.f. 

Huss d Pvs-s d \xus- (tf> sa > sd \ x , s e ) Pxu\s« 0> u\s e )p S e (s e ) 



PY\xs*{y\ x ,s a ) 



T,yuss d PYS-s d \xus-(y^ s ° \ s d \x,u, s e ) Pxu\s-(x,u\s e ) p s -{s e 



Since PYS a S d \xus e ls an element of T > Ys a s d \xus e [ J ^-:Pxu\S ti Ps e ] in <|3.12j> . py\xs a defined above is 
an element of A and may be viewed as a functional of PYS a S d \xus e (f° r fixed Pxu\S e an d Ps e )- 
Hence the cost function in (|3.12p may be written as 

D(ps\\ps) + D{p s -Pxu\s-PYS-s d \xusA\PsPxu\s-PY\xs-) + \J~R\ + 

= D(p S e PxU\S e P~YS a S d \XUsA\PS £ S a S d PXU\S £ PY\XS a ) + \J ~ R\ + 

where the equality follows from the chain rule for divergence. Thus the cost functions in (|3.12j) 
and (|3.3p are identical; the only difference is the domain over which the minimizations are per- 
formed. In (13. 3p . the minimization over p Y s a s d \xus e 1S unconstrained, and the minimization 
over py\xs a i s over A. In (|3.12p . the minimization over PYS a S d \xus e * s constrained to the set 
T~ > YS a s d \xus e [A,Pxu\S e PS e ]> and PY\xs a ls a fixed element of A once P~YS a S d \XUS e * s fixed. In other 
words the minimization in (|3.3p is over a larger set, and we have E^ MC (R) < E^ AM (R). Taking 
the limits of both sides of this inequality as L — ► oo, we obtain E^ BMC (R) < E^ AM (R). 

Similarly, from Lemma 13.31 we have E°£ M (R) < \C L - R\ + ; taking limits as L — > oo, we obtain 
Ef AU (R)<\C-R\+. ' □ 

The inequality E^ T>MC (R) < E^ AM (R) is not as surprising as it initially seems, because the 
proof of Theorem [331 shows there is no loss in optimality in considering CAM's that are uniform over 
conditional types, and there are more conditional types to choose from under the CDMC model. 
Generally that additional flexibility is beneficial for the adversary, and the worst conditional type 
does not satisfy the hard constraint (|2.4p . See Sec. 0] for an example. 



Remark 3.1 In the absence of side information (degenerate p S e S d), the optimal U = X, and 
(E23p becomes E? AM (R) = \C - R\ + . The expression for E r (R) derived by Hughes and Thomas 
\2(tf (Eqns (9), (6), also see the observation on top of p. 96) is upper-bounded by \C — R\ + ; they 
also provide a binary-Hamming example in which equality is achieved. Our result implies that the 
upper bound \C — R\ + is in fact achieved for any problem without side information in which there 
exists a hard constraint on the conditional type of the channel output given the input. 



3.4 Capacity 

As discussed in Sec. 12.41 Gel'fand and Pinsker's proof of the converse theorem in [I] can be extended 
to more complex problems such as compound Gel'fand-Pinsker channels |12| 115]. The capacity for 
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the generalized Gel'fand-Pinsker problem is given in Theorems 13.61 and 13.71 respectively. Achiev- 
ability of C follows from Theorems 13.21 and 13.41 Indeed, for any e > 0, there exists L(e) such that 
Cl > C — e. The proof of the converses appear in Sec. [7] and [H 

Theorem 3.6 Under the CDMC model (Def. \2.4\ ) for the adversary, capacity for the generalized 
Gel'fand-Pinsker problem is given by h3. 1\) for both combinations of maximum- cost constraints \2. 1\) 
and expected- cost constraints \2. 2\) on the transmitter. 

Theorem 3.7 Under the CAM model (Def. \2. 5\) for the adversary, capacity for the generalized 
Gel'fand-Pinsker problem is given by \3. 1\) for both combinations of maximum- cost constraints 
112. 1\) and expected-cost constraints 112. 2\) on the transmitter. 

The proof of the CDMC converse is similar to that in [12]; the proof in the CAM case exploits 
the close connection between the CAM and CDMC problems. 

3.5 Remarks on Cardinality of U 

The sequence Cl defined in (3.2) is nondecreasing and converges to the capacity limit C, but 
one may ask at what rate. When the feasible set A has finite cardinality, by application of 
Caratheodory's theorem, it suffices to select L = \X\ \S e \ + 1*4| — 1 (see [El [30] for related problems). 
When A is a compact set, one may construct a sequence 6l J. and a sequence {Al} of subsets of 
A that is dense in the L\ norm: 

Vpy\xs* e A, 3p Y \xs* e A L ■ max\\p Y \xs^(-\x,s a ) - p Y \xs^(-\ x ,s a )\\ < e L - 

This may be done, for instance, by applying a uniform quantizer to each PY\xs a (v\ x i s<1 ) to obtain 
i>Y\xs a {y\ x i s" 1 )- By continuity of the functional Jl, the effect of this quantization on Jl can be 
made arbitrarily small by letting L — > oo. Finally, Caratheodory's theorem can be applied to the 
set of \ Al\ attack channels so that max Pj[(;|se m.m PY ^ xsa€J x L Jl(ps b Pxu\s e PY\xs a ) 1S achieved using 
L = \X\ \S e \ + \Al\ — 1- Proposition 13.81 below formally states this result when the feasible set of 
attack channels is defined by the distortion constraint (2.3). 

Proposition 3.8 Consider the subsequence {Cl} indexed by 

L=\X\\S e \ + {l + l)\ y W x W sa \ -I, 1 = 1,2,..- (3.14) 

Then 

c - 2\y\ ^L<c L <c. 

Proof. See Appendix [Gj 

For the random-coding exponent E r (R) , the idea is similar but the derivations are more involved 
because Kullback-Leibler divergence is not absolutely continuous with respect to its arguments; 
attack channels that lie on the boundary of the probability simplex require a special treatment. 

In Proposition 13. 101 below, the random-coding exponent is viewed as a function of D2 and, with 
a little abuse of notation, written as E r {D2). The random-coding exponent when the alphabet IA 
has size L is similarly denoted by E t ^l(D2). 
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Lemma 3.9 The function E r {D2) is continuous and nonincreasing in D<i- 



The above statement is a consequence of the fact that the Kullback-Leibler and mutual- 
information functionals are continuous in their arguments, and that the sets {A(D2)} are con- 
tinuously nested. 



Proposition 3.10 Denote by c < [ga^gd the minimum of Ps a S d \S e ° ver its support set. Define 
the constants 



max{ \y\ \S a \ \S d \, exp 2 



1 + 



--log(8|«S a | 5 |«S d |c) 



and D = maxj ^ d(x, y). Consider the subsequence {E r ^{D2)} indexed by 



L=\x\\s^m x \^\ s ^-i, i = i mia ,i mm +i, 



(3.15) 



Then 



E r ,L(D 2 ) < E,(D 2 ) < E rtL D 2 



\y\\s°\\s«\D + n 2 M\ +nymlsi]1 oii 



The gap between the lower and upper bounds in (|3.16p is 0(- 2 i — ) as I — > oo. 
Proof: See Appendix [Hi 



4 Binary- Hamming Case 



In this section, we consider a problem of theoretical and practical interest where S e = {0, 1}, 
S e is a Bernoulli sequence with Pr[S e = 1] = p e = 1 — Pr[S e = 0], transmission is subject to 
the cost constraint (|2.1|) in which V is Hamming distance, and the adversary is subject to the 
expected-distortion constraint (|2.3|) or to the maximum-distortion constraint (|2,5p . in which d is 
also Hamming distance. In both cases the set A is given by (|2,3p . We study three cases: 

Case I: p e = i, S a = S d = 0. This proble m is analogous to the public watermarking problem of 

B IS (Si- 
Case II: p e = i, S a = 0, S d = S e . This is the private watermarking problem of [12]. The CAM 

version of this problem is closely related to a problem studied by Csiszar and Narayan [T7] 

and Hughes and Thomas [20J. 

Case III: Degenerate side information: p e = 0, S e = S a = S d = 0. Unlike [171,120]. the attacker's 
noise may depend on X. 

In all three cases, we were able to derive some analytical results and to numerically evaluate error 
exponents. Capacity formulas for these problems are given below and illustrated in Fig. [3l 

In this section we use the notation p*q = p(l — q) + (1 — p)q. 
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0.1 0.2 0.3 0.4 0.5 

Figure 3: Capacity functions for Cases I— III when D 2 = 0.2. 
4.1 Case I: Public Watermarking 

Here p e = | and S a = S d = 0, so we have S = S e . Capacity for a fixed-DMC problem (adversary 
implements a binary symmetric channel (BSC) with crossover probability D 2 ) is given in Barron 
et al. [8] and Pradhan et al. [9]: 

( ^[h(S 2 )_-h(D 2 )], ifO<D 1 <5 2 ; 
C pub = g*(D!,D 2 ) ± \ h(8 2 )_-h{D 2 ), if 5 2 < D 1 < 1/2; (4.1) 
[ l-h{D 2 ), if £) a > 1/2, 

where 5 2 = 1 — 2^ h ^ D ^ and h(-) is the binary entropy function. The straight-line portion of the 
capacity function is achieved by time-sharing. Proposition 14.11 shows that the BSC is the worst 
channel for the CDMC and CAM classes considered. 

Proposition 4.1 Capacity under the CDMC and CAM models defined by the distortion constraints 
\2. 3\) and \2.5\) , respectively, is equal to C pub and is achieved for \U\ = 2. 

Proof: See Appendix O 

Proposition 4.2 The random- coding error exponent is a straight line in the CAM case: 
E^ AM ' puh (R) = \C puh — R\ + for all R. The minimizing p$ in \3.12\) coincides with ps, the max- 
imizing L = \IA\ = 2, and the minimizing Py\xus * s ^ e BSC py\x with crossover probability D 2 . 

Proof: See Appendix [Dl 
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Unlike the CAM case, in the CDMC case we have no guarantee that L = 2 is optimal for 
random-coding exponents. The exponents E^ MG (R) and E^ M (R) are shown in Fig. H] for the 
case D\ = 0.4, D 2 = 0.2, and L = 2; see Sec. 14.41 for details of these calculations. For the CDMC 
case, we have found numerically that the worst attack channel py\x is the BSC with crossover 
probability D 2 , and that the worst-case ps in (|3.4p coincides with ps- 

4.2 Case II: Private Watermarking 

Here p e = ±, S a = 0, S d = S e = S. 

Proposition 4.3 JZ2F- Capacity is given by 

CP riv = h(D 1 *D 2 )-h(D 2 ) (4.2) 
and is achieved when U = X (L = 2). 

For the random-coding exponents, we have no guarantee that L = 2 is an optimal choice. 
The exponents E^ MC (R) and E°£ M (R) in that case are shown in Fig. [5] for the case D\ = 0.4, 
D 2 = 0.2. As in Case I, for both the CAM and CDMC cases, the worst-case p s in (pT3|) and (f3TT2|) 
coincides with ps- 

The capacity expression (|4.2p was also derived for the AVC problem of Csiszar and Narayan 
|17j . albeit with different assumptions {p e = 0, i.e., degenerate side information, and channel state 9 
selected independently of X, see Appendix|A]). Error exponents for the latter problem were derived 
by Hughes and Thomas [20] . They obtained E r (R) = \C — R\ + at all rates below capacity. 

4.3 Case III: Degenerate side information 

Here p e = 0, S e = S a = S d = 0. 

Proposition 4.4 Capacity is the same as in the public watermarking game: C deg = C pub . 
Proof: See Appendix [El 

Proposition 4.5 E? AMAc& (R) = \C dc ^ - R\ + for all R < C dc s. 

Proof: Follows from Remark 3.1. 

Unlike Case I and Case II, the worst attack is an asymmetric binary channel, favoring outputs 
with low Hamming weight. Error exponents in the case D\ = 0.4, D 2 = 0.2, are given in Fig. [6l 
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4.4 Discussion 

Comparing Figs H] and [5j we see that the random-coding error exponents for L = 2 are only slightly 
larger when side information is available to the decoder. For instance, the zero-rate exponents are 
0.123 and 0.146 at rate zero in the CDMC case; and 0.249 and 0.263 in the CAM case. 

Some practical comments about the optimization problems solved in this section are in order. 
Among these problems, the calculations of random-coding exponents for the CDMC / public wa- 
termarking scenario are the most complicated ones, both of which have four layers of minimization 
or maximization. The number of the parameters to be optimized is 8|W| + 1 (1 for ps, 4|W| — 2 for 
Pxu\s> 2 for py\x an d 4|Z//| for Py\xus)- Other difficulties arise due to the lack of nice properties 
such as everywhere differentiability and convexity. There appears to be a substantial increase of 
computational difficulty going from \U\ = 2 to larger U. Based on the analytical results derived 
above, it is tempting to conjecture that \JU\ = 2 is an sufficient choice for optimality; unfortunately 
at this time we are unable to validate that conjecture analytically or numerically. 

We have used a genetic algorithm [31] to numerically solve the above-mentioned optimization 
problems. Advantages of genetic algorithms include easy implementation, robustness with respect 
to selection of starting points, no need for evaluation of function derivatives, and ability to handle 
high-dimensional problems. The parameters of a genetic algorithm may be selected to ensure that 
the algorithm is globally convergent. In particular, we have used an "elitist" genetic algorithm, in 
which the value of the best individual in each iteration is nondecreasing for a maximization problem 
(or nonincreasing for a minimization problem). The sequence of the best solutions in each iteration 
is guaranteed to converge to the global optimum almost surely [HU [32] . 
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5 Proof of Theorem Ed 



Owing to Lemma 13.11 for any e > 0, chosen independently of N, there exists L(e) such that 

E^ MC (R) > E? BMC (R) - e, VL > L(e). (5.1) 

We shall prove the existence of a sequence of codes ( //v > 9n ) such that 



lim 

N^oo 



-—log max P e (fN,9N,PY\ 

j\i c " 



pCDMC 



(«)■ 



The proof is given for the maximum-cost constraint (12. ip on the transmitter. Any code that 
achieves the error exponent £?5^ MC (fl) is therefore also feasible under the weaker average-cost 
constraint (12.2f) . A random ensemble of binning codes is constructed, and it is shown that the 
error probability averaged over this ensemble vanishes exponentially with N at rate E^ MC (R). 
Since the error probability functional P e (fN,9N,PY\xs a ) 1S continuous in py\xs a (by dZSj)) and the 
feasible set A of attack channels can be approximated with arbitrary precision (in the L\ norm) 
by a subset whose cardinality is polynomial in N, there exists a code (/jvj 9n) from the ensemble 
that achieves E^ MC (R) uniformly over A. It is therefore sufficient to prove that 



lim 



PY\XS a £A 



pCDMC 
E r ,L 



(R) 



(5.2) 



for the random ensemble considered. Combining (|5.2p and (|5.ip then proves the claim. 
The maximum-cost constraint (|2.ip may be written as 



'^2p s ^(s e ,x)T(s e ,x) < Di a.s. 



(5.3) 



Assume R < Cl — e. Define the function 



mm mm 

„ , £V [N] PY\XS a £A 

"ys a s" |xus e YS a S^\XUS e 

D(p s e p xu |s e P ys «s d |xus e I \Ps e S a S d PXU\S E PY\XS a ) 



s e Pxu\s e P 



ys"s"|xus L 
,[A 



e-R\ 



P s e G Pl^, P XU | S 6 G P^Ce^A). 



Let 



CDMC 



mm 



max S r c ?^ c (i?,p se ,p xu , se ), 



(5.4) 



(5.5) 



which differs from (|3.3[) in that the optimizations are performed over empirical p.m.f.'s instead of 
arbitrary p.m.f.'s. 

Consider the maximization over the conditional type p xu \ s e (viewed as a function of p s e) in 
(|5.5p . As a result of this optimization, we may associate the following: 
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• to any type p s e , 

a type class T^(p s e) = T u and a mutual information -^}g e (p s e ) — Ius e (Pu\s e Ps e ) : i 

• to any sequence s e , a conditional type class T^ s< ,(s e ) = T u | s e; 

• to any sequences s e and u G T^| 5 , e (s e ), a conditional type class T^^^u, s e ) = T x | us e. 

A random codebook C for U is the union of codebooks C{p s e) indexed by the state sequence 
type p s e (recall there is a polynomial number of types). The codebook C(p s e) is obtained by a) 
drawing 2 N ( R+p ( Pse ^ random vectors independently from the uniform distribution over Ty(p s e), 
and b) arranging them in an array with 2^ columns and 2 Np ^ rows. The design of the function 
p{ps e ) is arbitrary at this point but will be optimized later. 

Encoder . The encoding (given s e and m) proceeds in two steps. 

1. Find / such that u(/, m) G C{p s e) f] T^ Se (s e ). If more than one such I exists, pick one of them 
randomly (with uniform distribution). Let u = u(l,m). If no such I can be found, generate 
u uniformly from the conditional type class T^ Se (s e ). 

2. Generate X uniformly distributed over the conditional type class T^, use (u, s e ). 

Clearly, the p.m.f. of (S e ,U,X), conditioned on its joint type, is uniform, and the encoder's 
maximum-cost constraint is satisfied. 

Decoder. Given (y, s d ), the decoder seeks u G C = |J e C(p s e) that maximizes the penalized 
empirical mutual information criterion 

max max [I (u; ys d ) — tfj(p s e)]. (5.6) 

The decoder declares an error if maximizers with different column indices are found. Otherwise the 
decoder outputs the column index of u. The penalty function vfj(-) in (|5.6p will soon be optimized, 
resulting in the "matched design" ip = p. 

We now analyze the probability of error 

P e = max P e {F Nl G N ,PY\ XS a) 

of the decoder. 

Step 1. An encoding error arises under the following event: 

£ m = {(C,s e ) : (u(Z,m) G C and u{l,m) £ T* ]Se {s e )) for 1 < I < 2 Np ^} (5.7) 

conditioned on message m being selected. The probability that a vector U uniformly distributed 
over Ty(p s e) also belongs to T^ Se (s e ) is equal to exp 2 {— NI^j Se (p s e)} on the exponential scale. 
Therefore 

Pr[£ m \T s .} = (1 - Pr[U G T* |s ,(S e ) | U ~ U^Cp^))]) 2 ^^ 
= (l-2- NI us^)f Np(p ^ 

< exp{-2 N W> l * e ')- I vs*M)} (5.8) 

f exp{-2^} : if p(p s .) >I* se ( Pse ) + e 
I 1 : else. 
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The inequality (|5.8p follows from 1 + a < e a . The double-exponential term in (|5.9p vanishes faster 
than any exponential function. 

Step 2. We have a decoding error under the following event £' m : conditioned on message m 
being selected, there exists u' not in column m of an array C{p' s e) such that 



Therefore 



/(u';ys d ) - V(P^) > I(u;ys d ) - V(p s 



P e = max Pr[error \ m = l,py\xs a ] 

PY\XS a 



= max } Pr[T suxy ] Pr[error \ T suxy ,m = 1] 
Py\xs°- J— 1 

J- suxy 

< max V Pr[T suxy ] (Pr[£i|T suxy ] + Pr[£[ | T suxy ,^ c ]) 

PY\XS a * ' 

J- suxy 

= max V Pr[T suxy ] {Pr[£ x \T a e\ + Pr[£[ \ T suxy ,£f]) (5.10) 
py\xs°- J-^ 

J- suxy 

< max p r[T S uxy] (Pr[£ 1 \T gB ]+ Pr[£[ | T 8Uxy ,ff]). (5.11) 



PY\XS a 
-L suxy 



We will see in Step 3 that Pr[£[ | T suxy ,£°] does not depend on Py\xS a f° r the MPMI 

N 



decoder. Using the asymptotic relations P% (T z ) = exp 2 {— ND(p z \\pz)} and Pg, v (T z \ v ) = 



exp 2 {-iVZ)(p z | v ||p Z |y|pv)} US], we derive 

Pr[T suxy ] = Ps - P XU|S e - P y|XS Q ( T suxy) 



exp 2 { - iVD (p suxy | |psp xu | se py | xs« ) } 

eXV2{-ND(p s e Pxu \ s ep ys a a d\ xua e\\psPxu\ S ePY\XS«)}- ( 5 - 12 ) 



Step 3. Next we evaluate Pr[£[ | T SUX y , £%] , which can be written as Pr[£[ | T suxy , u, y , s d , £f\ , 
where u, y, s d is an arbitrary member of the conditional type class P uys d| xs e s a. 

Denote by p e (\i,y,s d ,p' sE ,T suxy ) the probability that the decoder outputs the codeword in row 
V and column m! ^ 1 of the array C(p' s e), conditioned on u, y, s d , and T suxy . This conditional error 
probability is independent of (l',m'). We have 

Pr[£[ | T suxy ,u,y,s d ,^] = 1 - JJl " Pe(u, y, s d ,p' se , T^f^^^ (5.13) 



where 



Pe(u,y,s d ,Pg e ,T suxy ) = ^ P( u '\p' s e 

U' €£4 (u,y ,S d ,p' ,p SU xy ) 



S„. ,P5S0i (5 ' 14) 



u' gW e (u,y ,S d ,p' e ,p SU xy ) 
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and 



Z4(u,y,s rf ,^ e ,p suxy ) = {u'GT^e) : /(u'jys^)-^) >/(u;ys d )-Vfe e )}(5.15) 

is the set of codewords u' in the array indexed by p' se , that cause a decoding error, conditioned on 
u, y,s d , and T usxy . Also define the corresponding set of conditional types 

T e (u,y,sV se ,Psuxy) = {T u ,| ysd : T n , = T^{p' s e), I(u';ys d ) - i>(p' s e) > I(u;ys d ) - ^(p s e)} 

C {T u , |ysd : /(u';ys d ) - > /(u;ys d ) - ^(p s e)} . (5.16) 



Therefore 



d l rp \ Ku'lys 



Pe(u,y,s d ,Pg e ,T suxy ) = ^ 



T U '|y S d er£: ( U 'y' Sti 'Pse 'P^uxy) 

^ 2 -Af/(u';ys° 



< 2 -N{I(u;ys d )-iP( Pa e)+i,(p' se )] ^ ^ 

because jy^j = 2 _iV/ ( u ' ;ys ), and the number of conditional types T u /| ys d is polynomial in N. 
Next we use the following inequality, which is proved in Appendix [Fj 



(5.18) 



l-\\{l-ai) u < min^l^a^, < a» < 1, U > 1. 

Applying (|5.18|) and (|5.17p successively to (|5.13p . we obtain 
[£[ | T suxy , 5f] = Pr [£[ | T suxy , u, y , s d , £{) 

< mini l,^p e (u,y,s d ,^,T suxy )2 7V ^)(2 7Vfl -l) 



< min \ 1, V 2- Ar [ / ( u 'y sd )-^^ e )+^^)-P(P^)-- R ] 
I & 

= exp 2 |-iV|/(u;ys d ) - ^(p ge ) + min^e) - p(p' se )] - + | . (5.19) 

Step 4. Combining (|5,19p and (|5.9p . we obtain 

Pr[£i|T s e] + Pr[£{ [ r suxy ,£f] < exp 2 j-iVT(.R, p, ^,Ps e ,Pxu|s°>Pys°s<*|xus<0} ( 5 - 20 ) 



25 



where we have defined the function 



T(R, p, lf),P s e , P xu \ s e , P ys a s d| xuse ) 

\I{u-,ys d )-iP(p s e)+mm p , se [iP(p' s e)- p^e)} -R\+ : p(jp sB ) > Ifj S e{p s <>) + e 
: else. 

Applying the inequality min,/ F(p' e ) < F(p s e) to the function F = tp — p, we obtain 

T(R, p, ifi, p s e , p xu | se , P ys a s d\ 

XUS C / 

|/(u;ys d ) - p(p s e) - R\+ : p(p 8 e) > I* USe {p^) + e 

: else. 



(5.21) 



< 



and thus 

r( J R,p,'0,p s <=,p xu | se ,p ys a s d| xuse ) < |JL(Ps e Pxu|s=Py S ^|xus=) -e-^| + (5.22) 
with equality when 

^(Ps«) = p(p s «) = lus^Ps") + e- (5.23) 
Combining (ISTTTD . ([512]) . ([5^0]) . and (I5T221) . we obtain 

P e < V max Pr[T suxy ](Pr[£i|T s e]+Pr[5{ | r suxy ,^ c ]) 

* ' PY\XS a 

suxy 

< max min min max max exp 2 { -N[D(p s e Pxu , s ep ys a s di xuse | \psP X u\s"PY\XS a ) 

Ps<= Pxu| S e p,1p P ys a s d| xs e PY\XS a < 

+ T(R, p, tp,p s e, p xu{s c , P ys a s d| xusC ) } (5.24) 

= exp 2 {-NE?™ c (R)} (5.25) 

Is 



where (I5.24|) holds because p xu i s e and (p, tp) can be optimized to achieve the exponent E^^ C (R) 



in (153]) . 

Step 5. By Lemma EH the function E™ MC (i?) is nonnegative and upper bounded by |Cx — 
R\ + . Applying f)2. 13j) with p~s e , Pxu\S e -> (pYS a s d \xus e 'PY\xs a )i an d D + \J — R\ + in the roles of the 
variables p, q, r, and the functional 4>, respectively, we conclude that the exponent E^n (R) m 
(I5T25D converges to the limit E^ MC (R) in as N -> oo. Since E°g MC (R) > for all R < C L , 
the probability of error vanishes if R < Cj_,. The claim follows from the fact that we can choose L 
such that Cl > C — e, for any arbitrarily small e. 



6 Proof of Theorem 13.41 

The proof is similar to the proof of Theorem [32] and is again given for the maximum-cost constraint 
(|2.ip on the transmitter. A random ensemble £ of binning codes (/jVj 9n) with fixed \U\ = L is 
constructed. This ensemble may also be viewed as a random ensemble of RM codes (Def. I2.3[) . 
RM codes are obtained by selecting a prototype (/aTjS'jv) from £ and generating the RM family 
{Ini9n} according to Def. 12.31 For RM codes there is no loss of optimality in restricting the attack 
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channel to a class of channels that are uniform over conditional types (see Step 2 below). It is 
shown that the error probability averaged over the ensemble £ vanishes exponentially with N at 
the rate E^ M (R) given in (|3.12p . Since the class of attack channels considered in Step 2 has 
polynomial complexity, there exists a RM code that achieves E^ M (R) for all attack channels in 

The codebook-generation, encoding and decoding procedures are the same as those in the 
CDMC case, with the difference that the types and conditional types generated/selected by the 
encoder are obtained by optimizing a slightly different payoff function. The probability of error 
analysis is similar as well. 

Assume R < Cl — e. Define 

E?,L$( R ,Ps e >Px.u\a°) ~ [jv] min 

Pys a s d |xus e ^YS a S d \XUS e [-^'P xu ls e Ps e 1 

D(p s e s a s d | \p S e S a S d) + Iy ; US e S d \XS a (Ps e Pxu\ S e Pys a S d |xUS e ) 

+ | ^L(Ps e Pxu|s^ys"sd|xus-) ~ e ~ (6- 1 ) 

for all p s e e 7>\^} and p xu \ s e G V^\ S e(L, Di). Let 

E?£%(R) ± min max E^ u (R,p se ,p xu{sB ). (6.2) 

which differs from f|3.12jl in that the optimizations are performed over empirical p.m.f.'s instead of 
arbitrary p.m.f.'s. Consider the maximization over p xu | s e (viewed as a function of p s e) in (|6.2p . As 
in the proof of Theorem 13.21 to the resulting optimal p xu | s e we can associate a type class T^(p s e), 
conditional type classes T^ s< ,(s e ) and T^, Ujge (u,8 e ), and a mutual information Ijjse(p s e Pu\a e )- 

Define p(p s e ) and tp{p s s ) as in (|5.23p . The random codebook C is a stack of codebooks C(p s e), 
each of which is obtained by a) drawing 2 N ( R+p ( Pse ^ independent random vectors whose components 
are uniformly distributed in T^{p s &), and b) arranging them in an array with 2^^ columns and 
2 n pM rows. 

Encoder . The encoding (given s e and m) proceeds exactly as in the CDMC case: 

1. Find I such that u(Z, m) G C(p s e) T^ Se (s e ). If more than one such I exists, pick one of them 
randomly (with uniform distribution). Let u = u(/,m). If no such I can be found, generate 
u uniformly from the conditional type class T^ Se (s e ). 

2. Generate X uniformly distributed over the conditional type class T^-^ use (us e ). 
Decoder. The decoder is the MPMI decoder of (|5.6p . We now analyze its probability of error 



P e = max P e (F N ,G N ,p Y \x.s°-)- 

Pv|xs a e7-V|xs° [*4] 

Step 1. An encoding error arises when no codeword with the appropriate type can be found. 
The probability Pr[£ m \T s e] of this event is given by (|5.9p . 
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Step 2. We have a decoding error under the following event £' m : there exists u' not in column 
m of an array C(p' s e) such that I(u';ys d ) — p(p' s e) > /(u;ys ) — p{p s e )- Therefore 

P e = max Pr[error \ m = 1] 

PY|XS a 



max y Pr{T SUX y }Pr\ error \ T su:X y,m — 1J 



PY|XS a 

J- suxy 



max 

PY|XS a 

J suxy 



Pr[T suxy ] (Pr[5i|T 8e ] + Pr[£( | T suxy ,^ c ]) . (6.3) 



Unlike (|5,lip . no dependency on a DMC py\xs a appears here. Observe that 

Pr[£[\T suxy ,£ c 1 }= Y, p(s|T s )p(xii|s e )p Y |xs a (y|xs a )Pr[^ | y, s d , T suxy , ^ c ]. (6.4) 

(suxy)gT suxy 



Here we can apply the following argument from [15]. From (|6.3p (|6.4p . we see 
that P e (Fiy, GtvjPyixs ) is an affine functional of PY|xs a - Moreover, it can be verified 
that P e (F/v, G7v,pY|xs a ) = Pe(Pv> GW,PY|xs a ) w here i is a permutation operator, and 
p Y | XSa (y|xs a ) = ^Y|xs a ( 7r y| 7rx ' 7rs<1 )- By uniform averaging over all permutations ir, we obtain 
an attack channel £>Y|xs a = M PY|xs a which is strongly exchangeable: if (X, S a ) is uniformly 
distributed over a type class, then Y is uniformly distributed over conditional class types. So 
without loss of optimality for the adversary, we can consider only strongly exchangeable channels 
in the analysis, for which PY|xs a ( v l x > sa ) is given by 

PY|xs4y|x,s a ) = P ^ Ty|xsa] , (6.5) 

I y|xs a I 

with T y | xs a to be optimized. Using the upper bound 

Pr[T y , xsa ] < I{ Pylxs a e A} (6.6) 

and the asymptotic relations Pr[T s ] = exp 2 {— ND(p s \\ps)} and ,^ , = exp 2 {— NI(y; wjz)}, we 
obtain 

Pr[T suxy ] = Ps r PxU|S e -fV|XS [! (^suxy) 

Pr[T s ] Pr[T ylxsa ] 



\T I 

\-*-s\ |-*xu|s e | |-^y|xs a 



1 Pr\r s ] |r|| J^' rlKlxs* -4} 

l J sl U xu |s e I I J y|xs a I 



l-'yjxs" 1 



= exp 2 {-N[D{p s \\ Ps ) + Iy 

■,US e S d \XS a (Ps e Pxu\s^Pys a s d \xus e )]^ ^{Py|xs a £ ^4}.(6.7) 

Step 3. This step is identical to the corresponding step in the DMC case and yields 

Pr[£[ | T suxy ,^ c ] < exp 2 (-N\J L (p s ep xu \ s ePy S a s d\ xus e) - e - P| + ]| (6.8) 
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Step 4. Combining (RTF]) . (I5l)1) . and ([611, we obtain 

P e < Yl Pr[T avX y]Pr[£[\ 

Tsuxy 



< max min max exp 2 <^ -N[D(p s e s a s d\\p S e S a S d) + I Y ■us^\XS^{Ps"P^ a \s^P vs-s^xus 6 ) 

Ps« Px U | S E P ys a s d| xus e L 

+ |^L(Ps e Pxu| S e.Pys a s d |xus e ) - e — -R| + ]j 

= exp^-A^^i?)} (6.9) 

because p xu \ s e was optimized to achieve the exponent E^^(R) in (|6.2p . 

Step 5. The last step is identical to that in the DMC case: the exponent E°£*$(R) in fT9j) 
converges to the limit E^ M (R) in f)3. 12|) as — > oo, and all rates below Cx are achievable. By 
choosing L large enough, Cl can be made arbitrarily close to C. □ 

7 Proof of Converse of Theorem 13.61 

The proof of the converse theorem is an extension of [12\ Prop. 4.3]. To prove the claim (derive 
an upper bound on capacity), we only need to consider the expected-cost constraint (|2.2p for the 
transmitter. Indeed replacing (12. 2j) with the stronger maximum-cost constraint (12.ip cannot increase 
capacity, so the same upper bound applies. Likewise, we assume as in [12] that the decoder knows 
the attack channel py\xs a i because the resulting upper bound on capacity applies to an uninformed 
decoder as well. 

Step 1. Choose an arbitrary small rj > 0. For any rate-i? encoder /jv and attack channel 
Py\xs a ^ A such that 

I(M,YS d ) < N(R-ri), (7.1) 

we have 

NR = H(M) = H(M\YS d ) + I(M; YS d ) 

< l + P e (f N ,g N ,p^ xsa )NR + I(M;YS d ) 

< 1 + Pe(fN,9N,Py\xs a ) NR + N(R — rj) 

where the first inequality is due to Fano's inequality, and the second is due to (|7.ip . Hence 

N ^N V -1 



Pe(fN,9N,PY\XS a 



> 



NR 

We conclude that the probability of error is bounded away from zero: 



Pe(f N ,9N,P^ XS a)>^ (7.2) 



for all N > -. Therefore rate R is not achievable if 



min J(M; YS d ) < N(R-rj). (7.3) 
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Step 2. The joint p.m.f. of (M, S,X,Y) is given by 

N 

Pa/sxy = PMPs- I{X = /7v(S e ,M)} YlpY\xs a (yi\x%,st)p saS d lse {sf,sf\sf). (7.4) 

1=1 

Define the random variables 

Wi = (M, 5f +1 , • • • , S^r, Sf, ■ ■ ■ , Sf_i,Yi, ■ ■ ■ , Yi~i), l<i<N. (7.5) 

Since (M, {£?, Sf, Yj}^) -> J^S? ^5^5^ forms a Markov chain for any 1 < t < AT, so does 

w t - - y^S?. (7.6) 

Also define the quadruple of random variables (W,S,X,Y) as (Wt, St, Xt, Yr)> where T is 
a time-sharing random variable, uniformly distributed over {1, ■ • ■ ,N} and independent of all 
the other random variables. The random variable W is defined over an alphabet of cardinality 
exp^A^fl + logmaxdcS 6 !, \y\ \S d \)}}. Due to ([?3|) and (USD, W -» X5 e -» YS a S d forms a Markov 
chain. 

Using the same inequalities as in [TJ Lemma 4] (with (y, Sf) and 5? playing the roles of Yj and 
5j, respectively), we obtain 

N 

/(M;YS d ) < Y^[I(Wi;YiSf)-I(Wi;Sf)]. (7.7) 
i=i 

Using the definition of (W, S, X,Y) above and the same inequalities as in [12], (C16)], we obtain 

N 

- /(Wi; Sf)] = N[I(W; YS d \T) - I(W; S e \T)} 

i=i 

< N[I{WT; YS d ) - I{WT; S e )} 

= N[I{U;YS d ) - I(U;S e )} (7.8) 

where U = (W, T) is defined over an alphabet of cardinality 

L(N) 4 N exp 2 {N[R + logmax(|S e |, |3>| \S d \)}}. 

Therefore 

I(M ; YS d ) < N[I(U;YS d ) - I(U;S e )} 
= NJ l(N )(p US xy) 
min I(M;YS d ) < N min J L (n)(PuSXy) 

< A^sup max min Jl(pusxy) 

L Pxu\se p Y \xs a <^A 

= N lim max min Jl(pusxy) (7-9) 

L^oo Pxt/|S e Py|xs a G.4 

where the last equality follows from Lemma 12. 11 Combining (|7.3p and (|7.9p . we conclude that R is 
not achievable if 

lim max min Jl(pusxy) < -R — ??, 
L— >oo Pxc/|s e py|xs n £"4 

which proves the claim, because rj is arbitrarily small. □ 
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8 Proof of Converse of Theorem 13.7 



The proof of the converse theorem builds on the proof for the C-DMC case. 

Step 1. Consider an attack channel P Y \xs a ^ na t ac hieves C{D\,A) in (|3.ip . Without loss of 
generality, assume that Ps a (s a ) > for all s a S S a . For any positive e, consider the following L\ 
neighborhood of P Y \xs a: 

= \py\xs- ■ Yl \PY\xs-(y\x,s a ) -p* YlxSa (y\x,s a )\ < e \ . 
k y,x,s a ) 

We have lim e ^o C{D\, B(e)) = C(Di,A). For any arbitrarily small rj, there exists e such that 

C(D 1 ,A)- V <C(D 1 ,B(e))<C(D 1 ,A). 

In order to prove the converse theorem, it is sufficient to show that reliable communication at 
rates R > C(D\, £>(e)) + 2rj > C(D\,A) + r\ is impossible for a particular attack channel PY|xs a £ 
^V|XS a [^( e )]- The channel we select is "nearly memoryless". Given any rate-i? randomized code 
(A4, Fn,Gn), we show that lini/v^oo -P e ,jv(-^V> Gn,Py\xs°-) > ^ hence is nonzero. 

Step 2: Construction of Py|xs°- Consider any rate-i? deterministic code (M, /n^Qn) where 
R > C(Di,A). From Theorem 13. 6^ we know that mm.f NtgN P e ,N{fN, 9N, (Py\xs a ^ N ^ -/->■ as N —>■ 
oo. Define an arbitrary mapping A : X N x (<S°) — > y N such that f>A(x.s a )|xs a £ B(e) for all 
(x, s a ). Denote by Y the output of (Pyixs^ Define the following functions of (y,x, s a ): the 
binary quantity 

B = l & P y lxs a £ B(e) 

and the sequence 

v f y : ifPy|xs« €B(e) (B = 0) 

y \ A(x,s a ) : else (B = 1). 1 j 

Therefore the p.m.f. 

\p YlxS «) N (y\*> sa ) Pr ( B = 0) +I{y = A(x,s a )}Pr( J B = 1)1 I{ PylXtS a G B(e)} 



PY|XS»(y|x,S a ) 



belongs to 6(e). 

Step 3. Now we seek an upper bound on Pr[B = 1]. Define the binary random variable A 
such that 

A = l ^ mm Pxs a(x,s a ) < e 2 . (8.2) 

x,s a 

The probability that A = 1 is a function of the code /at . We assume momentarily that /jv is such 
that 

Pr[A=l]<^L. (8.3) 

In Step 5 we show this assumption causes no loss of generality. 
Define the shorthand 

6 



f 6 



2 In 2 
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and the class of types 



V [N] (e) — I 
1 YXS a \ e ) — S 



Py xs a : D(Py\^ s a\\pY\ XS a\p^) > €, Him p xs a (x , S° ) > <? 



(8.4) 



A=0 



With this notation, we have 

B = l => \Py\ xs a(y\x,s a )-p* Ylxsa {y\x,s a )\> e 



y,x,s a 



B = I, A = 



3y|xs«Pxs« -Py|x5^xs«|| > e 2 ^ |p y|xs a s a ) - p Y \ XSa (y\x, s a )\ > e 3 



y,x,s a 



(8.5) 



=> £>(Py|xs" I \Py \XS" \Pxs a ) > 1 

where the first line follows from the definition of /3(e), the second line from (18, 3p . and the third line 
from Pinsker's inequality |16t p. 58]: D(p\\p') > \\p — p'\\ 2 /(2 In 2). 

We have 

Pi[B = 1] < Pr[B = 1, A = 0] + Pr[A = 1]. (8.6) 
Due to (|8.4p and (|8.5p . the first term in the right side is bounded as follows: 



Pr[B = l,A = 0] < Pr[^ePj5 s „(e; 



Pyxs a t/ yj^al^j 



^ (el 

< (AT + 1)1^1 1^1 l 5a l max Pr[Ty XS a] 

Pyxs° 

< (A T + i)\y\ i*i i 5a 

< (Ar + i)|yimi5 a l e xp 2 {-A^e} 



c<p [JV] /->. 



max exp 2 { -ND(p 9 \ xsa | b y , X5 a |p X s«) } 



(8.7) 



which vanishes as iV — > oo. Combining (|8.3p . (j8.6l) and (|8.7f) . we obtain 



Pr[B = l] < (AT + 1)1*1 1*1 2"^ + 



9R 



< 



8R 



for A' large enough. 

Step 4. For any (/jv,<7iv), we have 

Pe,N(fN,9N, (PY\XS a ) N ) 

= Pr[M j£ M\Y,S d ] 



Pr[M ^ M|Y, S d , £ = 0]Pr(5 = 0) + Pr[M / M|Y, S d , S = l]Pr(B = 1). (8.8) 
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Likewise, 

Pe,N (/jV i 9N j £*Y|XS a ) 
= Pr[M j£ M\Y,S d ] 

= Pr[M ^ M\Y,S d ,B = 0]Pr(B = 0) + Pr[M / M\Y,S d ,B = l]Pr(B = 1). (8.9) 

Noting that the terms multiplying Pr[B = 0] in (|8.8j) and (|8.9p are identical by construction of y 
in (|8.ip and that Pr[i? = 1] is upper bounded by we obtain 

\Pe,N(fN>9N,PY\XS«) ~ P e,N(fN , 9N , (p Y \XS°) N )\ ^ 2Pr [ B = 1] < VJn,9N- 

Since i? > C(D\,A) + r], Theorem 13.61 implies that 

lim min P e ,N(fN, 9n, (p y \xs*) N ) ^ 

Hence 

^/X Pe ^ (/7V ' 57V ' PY|xsa) " m ~ ik = Jr- 

Therefore 

lim min P ejN (F N , Gjv,Py|xs°) > t^- 

N^oo f N ,g N 4K 

for any randomized code (Fn,Gn). 

Step 5. It remains to prove there was no loss of generality in making the assumption (|8.3p . 
This is done as follows. Given any code /n (that may not satisfy (|8.3p ). we can extend the code by 
appending iVe| A' | — 1 letters x at the end of the sequence, for each x £ X. The resulting code has 
length (1 + e)N and will be denoted by fri+ e m- To this code we can associate a decoding function 
9(i+c)N that ignores the last eN letters of the received sequence and outputs the same decision as 
qn based on the first N received letters. Hence 

Pe,N(fN,9N,PY\XS a ) = Pe,N(f(l+e)N, 9(l+e)Ni PY\XS a )i Vp Y |XS«- (8.10) 

If /(i +e )7v satisfies (|8.3p . it follows from Step 4 that reliable communication is impossible using such 
codes, and from (|8.10p the same conclusion applies to fjy- 

We now show that /( 1+£ )tv satisfies (I8.3P for N large enough. Denote by N'(x,s a ) the number 
of occurrences of the pair (x, s a ) in the last eN letters of the joint sequence (x, s a ). Also denote by 
N'(s a ) the number of occurrences of s a in the last ej^l -1 N letters of the sequence s a . 

If S a + 0, we have 

Vx,s a : Pr[N'(x,s a ) < e 2 N] = Pr[N'(s a ) < t 2 N\. 

Since E[iV(,s a )] = Ne\X\~ 1 psa(s a ), the above probabilities vanish exponentially with N provided 
that e < mm s a pga(s a ). Hence 



minp xs a(x, s a ) < e 2 

x,s a 



Pr[A = 1] = Pr 

< \X\ \S a \ max Pr\p xs a(x,s a ) < e 2 ) 

x,s a 

< \X\\S a \ m&xPr[N'(x,s a ) < e 2 (l + e)N] 

x,s a 

= \X\ \S a \ m&xPr[N'(s a ) < e 2 (l + e)N]. 
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Thus (|8.3p holds for e(l + e) < \X\ 1 min s a ps<* (s a ) and N large enough. 

If S a = 0, straightforward changes in the above derivation yield Pr[A = 1] = 0, i.e., (|8,3p holds 
again. This concludes the proof. □ 

9 Discussion 

In their landmark paper, Gel'fand and Pinsker [1] showed that random binning achieves the capacity 
of a DMC with random states known to the encoder. However their encoder was not designed to 
provide positive error exponents at rates below capacity. In this paper we have addressed this 
limitation and proposed and optimized a new random-coding scheme. The codebook consists of a 
stack of codeword-arrays indexed by the encoder's state sequence type A. The size of these arrays 
is 2 Np ^ x 2^^, i.e., the number of rows is a function of A. The decoder is the Maximum Penalized 
Mutual Information decoder (|3.6p . where the penalty is the same function p(X) that determines the 
array sizes. This new MPMI decoder can be interpreted as an empirical generalized MAP decoder. 

The channel models studied in this paper generalize the original Gel'fand-Pinsker setup in two 
ways. First, partial information about the state sequence is available to the encoder, adversary, 
and decoder. Second, both CDMC and CAM channel models are studied. 

We have considered four combinations of maximum/expected cost constraints for the trans- 
mitter and CDMC/CAM designs for the adversary, and obtained the same capacity in all four 
cases. There is thus no advantage (in terms of capacity) to the transmitter in operating under 
expected-cost constraints instead of the stronger maximum-cost constraints. 

In terms of error exponents however, there is a definite advantage to the adversary in choosing 
a CDMC rather than a CAM design of the channel. This is because 1) arbitrary memory does not 
help the adversary because randomly-modulated codes and a MMI-type decoder are used, 2) the 
set of conditional types the adversary can choose from is constrained in the CAM case but not in 
the CDMC case, and 3) the error exponents are determined by the worst types. The random-coding 
exponent is always upper bounded by a straight line with slope —1 at all rates below capacity. That 
upper bound is achieved in the CAM case, when no side information is available to the encoder. 

Finally, neither the MMI nor the MPMI decoder is practical, and it remains to be seen whether 
good, practical encoders and decoders can be developed. 

Acknowledgements. The authors are grateful to M. Haroutunian, A. Lapidoth, N. Merhav, 
and P. Narayan for helpful comments. 
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A Relation Between CAM and AVC Models 



In this appendix, we detail the relation between a channel model Py|Xj with maximum distortion 
constraint (|2.5p . and the AVC model in [16]. The AVC is a family of conditional p.m.f.'s 9), 
where 9 G (finite set) is a "channel state" selected by the adversary. A cost function / : — > M + 
for the states is also defined. The channel law is of the form 

N 

p(y\x,0)=l[W(y i \x i ,6 i ) (A.l) 

i=l 

where the sequence 6 = {9\, ■ ■ ■ , 9n} is arbitrary except for a maximum-cost constraint 

1 - 

*"(0) = TfX)W< W (A.2) 

8=1 

In some formulations of the jamming problem, 6 must be selected by the adversary before seeing 
x; in other formulations, 9i is allowed to depend on x% but not on other samples of x [171 GO]; yet 
in other formulations (the A*VC model [IE]), 9{ is allowed to depend on Xj for all j < i. 

If is allowed to depend on the entire sequence x in a noncausal manner (as opposed to the 
above formulations of the AVC problem), the problem with maximum distortion constraint (|2.5p 
may be formulated as (jA.ip and (|A.2[) with state 9, channel W, and cost I defined below. Let 
= (9', 9") where 9' G X and 9" G y, hence 9 = X x y. Let 

l(9) = d(9',9"), W(y\x,9)=I{y = 9"}. 

The maximum-cost constraint (|A.2|) is then equivalent to the maximum-distortion constraint (|2.5|) . 
with Z max = I?2- The sequence 9" may be chosen deterministically or stochastically, using an 
arbitrary distribution. 

B Error Exponents for Channels Without Side Information 

This appendix summarizes some known results on random-coding error exponents. 

Single DMC: Let py\x an d Px be the channel law and input p.m.f., respectively. Referring to 
[T6| p. 165-166], we have 

E r (R,p x ,p Y \x) = ¥™{D{py\x\\py\x\px) + \Ixy{px,Py\x) ~ R\ + ], (B.l) 

Py\x 

Compound DMC: Here Py\x belongs to a set A. We have 
E r (R,p x ,A) = min E r (R,p x ,PY\x) 

= mm mm[D(p Y \x\\PY\x\ Px) + \Ixy(px,Py\x) ~ R\ + ] (B.2) 

Py\x£A Vy\x 

which is zero if R > min py|xe ^ I X y (px,Py\x)- 
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Private Watermarking: the set A is defined by the distortion constraint f|2.3|) . Then 

E^ AM (R,D 1 ,D 2 ) = max min _[I S y\x(ps,Px\S>Py\Xs) 

Px\s PY}xs eA 

+ \Ixy\s(ps,Px\s,Py\xs) ~ R\ + ] (B.3) 

where A = {py\xs : J2 s Ps( s )Px\s(x\s)p Y \xs(y\x, s)d(x, y) < D 2 }. The maximization over 
Px\s 1S a ^ so subject to a distortion constraint. 

Jamming with channel state S selected independently of input X [X 71 l20j . We have 

E J r am (R) = maxmin min [D(Pysx\\py\sxPxPs) + \Ixy(px,Py\x) ~ R\ + ] 

Px PS Pysx ■ 

Px = Px , Ps — Ps 



(B.4) 



C Proof of Proposition 14.11 



The set A, denoted here as V Y \x(D 2 ), is the set of DMC's that introduce maximum Hamming 
distortion D 2 . Let the attack channel p Y \ x ' 3e the BSC with crossover probability D 2 . Considering 
p Y \x ma y n °t ^ e * ne wors t channel, we have 

C pub = sup max min Jl[psPxu\SPy\x) 

< sup max Jl(psPxu\sPy\x) 

L PxulsCPxulsiL'Di) 1 
= g*(Di,D 2 ), (C.l) 



where the last step is derived in [8l |9] . The function g* is defined in (14.11) . 

Next we prove that C pub > g*(D u D 2 ). Consider D 1 = D[9, where D[ £ [0, ±] and e [0, 1]. 
Let p\j\ s be the BSC with crossover probability D[. Furthermore, X = U makes the distortion 
equal to D[. (Note that L = \U\ = 2 in this case.) Clearly, 

C^ h (D[) > mm J l (psP xu \ s Py\x) 

Py\x£Py\x\ d V 

min I(X;Y) - (1- h(D[)) 

Py\x£Py\x{D2) *> v ' 

I(U;S) 

= (l - h (D 2 )) - (1 - h(D[)) 

= h(D[)-h(D 2 ), (C.2) 



where 



is achieved by Py\ x - 



min I(X;Y) = 1 - h(D 2 ) 
Py\x^Py\x( d 2) 
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Using time-sharing arguments, Barron et al. [8] proved that capacity is a concave function of 
D\ in the case A = {Py\x^' ^ can be shown that their result holds in the case A = Vy\x{D 2 ) 
considered here. Therefore we have 



c pub (Di) = c puh (D[9) > ec pub (D[) > e (h(D[) - h(D 2 )) , ye e [o, i]. 

It may be verified that 

ii ti i 

Therefore 



max 9 (h(D[) - h(D 2 )) = g*(D 1 ,D 2 ). 
o<e<i v ' 



C^ h >g*(D u D 2 ). 
Prom ([(HI) and (^3j) . we conclude that C pub = g*(D 1 ,D 2 ); also \U\ = 2. 



(C.3) 

□ 



D Proof of Proposition 14.21 

From (|3.13p . we have 

E^ AM >P nb (R) = sup min max min 

L Ps Pxu\s£'Pxu\s( l , d i)py\xus£'Py\xus( d 2) 

+ \Jl(psPxu\sPy\xus) ~ R\ 

Step 1. First we prove that 

F{Di,D 2 ) = sup min max min Jl(PsPxu\sPy\XUs) 

L PS Pxu\s£'Pxu\s{ l > d i)Py\xus£'Py\xus( d 2) 

£rpub 



D(ps\\ps) + Iy-us\x(PsPxu\sPy\xus) 

(D.l) 



(D.2) 



with equality if ps = Ps- 

Referring to (14, ip . we first consider the regime in which time sharing is not needed: D\ > 5 2 = 
1 - 2~ Tl ^ and therefore C pub = h{D x ) - h(D 2 ). Letting U = X and p* x]s be the BSC with 
crossover probability D±, we obtain a lower bound on F(D\, D 2 ): 

F(D 1 ,D 2 ) > min min Jl(Jps P* x \sPy\Xs) 



mm mm 

Ps Py\xs^'Py\xs{ d '2) l 

min min 

Ps Py\xsCPy\xs( d 2) l 



IX;y{PSP* X \sPY\Xs) ~ IX;S(PS P*X\s) 

Ixx (ps P* x \sPy\xs) ~ {H D i * Po) - h(Di)) (D.3) 



where we use the shorthand po = ps(0). Next, write pyixs as 



Py\xs 


XS = 00 


X5 = 10 


XS = 01 


XS = 11 


Y = 


1 -e 


/ 


1-5 


h 


Y = 1 


e 


1-/ 


5 


l-/i 
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The p.m.f. of X induced by ps and p* x \ S is given by 

Px = (pxo,Pxi) 

= (po(l - D x ) + (1 - po)£>i, Po^i + (1 -Po)(l - Ih)) . 

We derive 

min ^ Jx;y(psP* x \sPy\xs) 

PY\XS^Y\XSK D 2) 

= min Mpxo(1 - a) + pxifi) ~ PxoH 1 ~ a) - pxiHP) 

e,f,g,h: 

P0« 1 -- D l) e + D l/) + ( 1 -P0)(( 1 -- D l)' 1 +- D l9)<- D 2 

= h(D x *p )-h(D 2 ) (D.4) 

where 

Po (l- D 1 )e + (l-p )D 1 g ppDif + (1 - p )(l - D^h 

a = and p = . 

The minimum is achieved by 

a * = -P 2 Pxi ~ £>2 ^* _ £>2 Pxo - D2 
pxo 1 - 2£ 2 pxi 1 - 2Z? 2 ' 

Combining (|D.3|) and (|D.4p . we obtain 

F(D 1 ,D 2 ) >h(D 1 )-h(D 2 ) = C pub . (D.5) 

In the case Z?i < S 2 , capacity is achieved using time-sharing: C pub > h{D\) — h(D 2 ). Similarly 
to [8], it can be shown that F(D\, D 2 ) is a nondecreasing concave function of D\. Hence, 

F(D 1 ,D 2 ) = F(D\6,D 2 ) > max 6F(D\,D 2 ) > max (h(D\) -h(D 2 )) = C pub . (D.6) 

0<6K1 0<6»<1 v ' 

For all values of D\, letting p$ = Ps m (|D.2p and further restricting the minimization over 
Py|Xf/S> we have 

F(D 1 ,D 2 ) <sup max min J L (psPxt/|sPy|x) = ^ pub . (D.7) 

L Pxu\s£Pxu\s{ l , d i)Py\x£Py\x{D2) 

Combining (jDT5|> . dDTHj) and (|D?fl) . we obtain (fD~2l . 

Step 2. The first two bracketed terms in (|D.ip are nonnegative. This yields a lower bound on 

s CAM,pub (jR) . 

£CAM,pub (jR) > supnjin max min |J L fePXC7|5Py|XC/5)-^l + 

L PS Px(7|sS'Px(7|s(- L .- D l)Pv-|X!7SG'Py|X(7s(- D 2) 

= |CP ub — (D.8) 

where the equality is due to (|D,2p , 

Step 3. If we fix ps = Ps = (| , |) and restrict pyjsc/x to be of the form Py\x-> we obtain an 
upper bound on E^ AM,puh (R): 

s CAM lP ub (fl) < sup m min iJi^Pxc/isPyix)-^! 4 " 

L Pxi7|sGPjfi7|s( i: '> I) l)Py|xG , Py|x(-D2) 

= |C pub -i?|+. (D.9) 
Step 4. Combining flES} and (EH]), we obtain £ r CAM ' pub (i?) = |C pub -R\+. □ 
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E Proof of Proposition 14.41 



We have 

C deg = max min I(X;Y). (E.l) 

Let a = f>x(l)> e = Py|x(l|0)j an d f = ?V|x(0|1)j which satisfy the distortion constraints 

a<£>i, (1 -a)e + af < D 2 . 
Substituting these probabilities into (lE.lj) . we obtain 

C dcg = max min \h((l - a)(l - e) + af) - (1 - a)h(e) - ah(f)} . 

a<D 1 (1— a)e+af<D 2 

Solving the above max-min problem in the case D\ > 5 2 = 1 — 2 h< - D2 \ we obtain the optimal 
p* x and Py-\x ^ om 

D 2 {Dx-D 2 ) D 2 (l-D 1 -D 2 ) 
a = Di, e=- — — J 



(l-Di)(l-2D 2 y D x {l-2D 2 ) 

After some algebraic simplifications, we obtain C deg = h{D\) — h(D 2 ). Applying the same time- 
sharing argument as in the proof of Prop. 14. 1\ we obtain C dcg = g*(Di,D 2 ), which is the same as 
the capacity C pub for the public watermarking game. □ 

F Proof of (J5718D 

The inequality 1 - JJ^l - atf* < 1 being trivial, it remains to prove that 

l-Y[(l-a t ) u <^«A 

i i 

or equivalently, 

>i-j2 aiti - 

i i 

Define the K- vector 1 whose components are all equal to 1, the K- vectors a and t with components 
{a-i} and {U}, and Q = [1, 00)^, the domain of t. Denote by V/ the gradient vector of a function 
/ defined on R^, and by a • b the dot product of two vectors in W K . Define the functions 

/(t) = 15(1-00* (F.l) 

i 

g(t) = /(l) + (t-l)-V/(l) (F.2) 
h(t) = 1 - a ■ t (F.3) 

We need to prove that /(t) > h(t) for all t <E and a £ [0, l) K . In Step 1 below we establish that 
/(t) > S'(t). In Step 2, we prove that g(t) > h(t). 
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Step 1. The function g(t) describes a hyperplane tangent to the graph of f(t) at t = 1. The 
function /(t) may be written as 



/(t) = exp|^t i ln(l-a i )| 



(F.4) 



It is convex and therefore f(t) > g(t), owing to the hyperplane separation theorem. 
Step 2. From (fR2|) and (|R3|) . we have 

df(t) 



g (t)-h(t) = [f(l)-h(l)]+Y,(ti-l) 



t = l 



(F.5) 



Observe that /(l) = Y\i(^ ~ a i) an d = 1 — oti\ therefore we have the well-known inequality 
/(I) > Next, since each term t , — 1 in (|F.5|) is nonnegative for t G fi, it suffices to prove that 



df(t) 



to establish that g(t) - h(t) > for all t G SI 
From (|F.4p . we obtain 

df(t) 



> -a,: 



(F.6) 



t=i 



dU 



/(l)ln(l-ai), Vt, 



(F.7) 



t=i 



where < /(l) < 1 — «£. Since ln(l — a«) < 0, this implies 

/(l) ln(l - ai) > (1 - ai) ln(l - a,). (F.8) 

We now prove that 

(l-ai)ln(l-ai)>-ai (F.9) 

which, combined with (|F.8|) and (|F.7p . will establish (|F.6|) . Putting x = 1 — ai, we apply the 
inequality In - < ^ — 1 to claim that 

xlnx = — x In — > —x ( 1 | = x — 1 

x \x 



which proves ()F.9j) . The proof is complete. 



□ 



G Proof of Proposition 13.81 

The variational distance for p,p' € Vy is defined as \\p — p'\\ = ^2 y \p(y) —p'(y)\ and extended to 
conditional pmf's p,p' &Vy\x as \\p ~ p'W = max x \p(v\ x ) ~ p'(y\ x )\- 



Lemma G.l ]lb\ p. 33]. For any p,p' G Vy , we have 

1 



\\p-p'\\ < < 



H p (Y)-H p ,(Y)\< 9 log 



\y\ 
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Lemma G.2 For any px £ Vx andp,p' £ Vyix, we have 

\\P-P'\\ <0<\ =► \H PxP (Y\X) - H Pxp ,(Y\X)\ < eiog 1 -^. 



\H pxP (Y\X)-H pxp ,(Y\X)\ 



Proof: 

Y,Px(x)[H P (Y\X = x) - H P <(Y\X = x)} 

X 

< max\HJY\X = x)-H p ,(Y\X = x)\ 

X 

where the last inequality follows from Lemma |G. 11 

Proof of Proposition 13.81 

The upper bound is straightforward. We now derive the lower bound. 
Step 1. Define a discretized set Ai C A of attack channels as follows: 

Ai = {q&A : q{y\x,s a )e{^r\2l-\--- , 1} Vy,x,s a }. 

Step 2. Any attack channel p € A satisfies 

s ^p x {x)d 2 {x) < D 2 

X 

where 

d 2 (x,s a )^^2p(,y\x,s a )d{x,y) (G.l) 
y 

is the distortion introduced by p when X = x and S a = s a . 

We construct an approximation p of p with the following properties: 

]Tp(y|*, S a ) = 1 (G.2) 

y 

J2p(y\x,s a )d(x,y) < d 2 (x,s a ), Vx,s a , (G.3) 
y 

hence p € Ai- The construction of p is as follows. Define p + (y\x, s a ) as the least upper bound on 
p(y\x, s a ) in the set 2 • • • , 1}. Therefore < p + (y\x, s a ) — p(y\x, s a ) < l" 1 . Define 

k = ^2(lp + (y\x,s a )-lp(y\x,s a )) 
y 

= iY,p + {y\*^ a )-i e{o,i,--- ,|^|-i}, (G.4) 
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dk(x) as the sum of the k largest values of d(x,y) when y ranges over y, and yt{x) as the set of h 
corresponding values of y, that is, we have dk(x) = J2 y ey k (x) d(x,y)- Now let 

p(y\x,s a )=p + (y\x,s a ) - j l {ye y k} . (G.5) 

We have the following properties: 

k 

J2p(y\ x ^ a ) = J2p + (y\ x ^ a ) -j = }2p(y\x,s a ) = i; 

y y y 

therefore (|G.2j) holds. Also 

XI d ( x > y)p(y\ x > s ") = X s<1 ) ~ X d ( x > y)(p + (yl x > sa ) - p(y\ x ^ s(1 )) 

y y y 

+ d i x , y){P + (y\x, s a ) - p{y\x, s a )). 
y 

From (|G.1|) . the first sum in the right side is equal to d2(x,s a ). Owing to (|G.5p . the second sum is 
equal to 

J2d(x,y)jl {ye y k} = -d k (x). 

y 

The third sum is equal to 

j^2d(x,y)(lp + (y\x,s a )-lp(y\x, S a )) < jd k (x) 
y 

where the inequality holds because 

lp + (y\x, s a ) - lp(y\x, s a ) < 1 and s ^(lp + (y\x,s a ) - lp(y\x,s a )) = k. 

y 

Hence (|G.3j) holds as well, and p G Ai- The cardinality of Ai is at most (I + 1)^1 ^ 
By construction of p, we have 

\y\ Ao 



\\PY\XS a ~ PY\ 



xs a 



< 
- I 



Step 3. Consider an alphabet IA of arbitrarily large cardinality. We have 

J M (.) = I(U] YS d ) - I(U; S e ) = H(Y\S d ) - H{Y\US d ) + I(U; S d ) - I(U; S e ), 

hence 

\J\U\(PsPxU\S e PY\XS a ) - J\U\{PSPxU\S e PY\XS a )\ 
= \H PY]xsa (Y\S d ) - H PY]xsa {Y\US d ) - Hp nxsa {Y\S d ) + Hp Y]xsa (Y\US d )\ 

< \H PY]xsa (Y\S d ) - H pYlxsa (Y\S d )\ + \H PY]xsa (Y\US d ) - H pnxsa (Y\U S d )\ 

< 2 e\og^ = 2\y\^- (G.6) 
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where the second inequality is obtained by application of Lemmas IG.ll and IG.2I 

Step 4. By application of Caratheodory's theorem, given a pmf pxus e where U has arbitrarily 
large cardinality, and given L real- valued functionals fi,l < i < L defined over the set Vxs e , there 
exist L elements ux, ■ ■ ■ ,ul oilA and L nonnegative numbers a\, • ■ ■ ,cul summing to 1 such that 

L 

^2pu(u)fi{px S e\u=u) = ^2a u fi{pxs-\u=u), i = l,2,-- - ,L. 
ueu u=i 

The payoff function in the mutual-information game takes the form 

J\u\{psPxu\s*PY\xs«) = I(U;YS d )-I(U;S e ) 

= ^2pu(u)[-H(YS d \U = u)+ H{S e \U = u)] + H(YS d ) - H(S e ). 

UEU 

We apply Caratheodory's theorem to our problem by letting 

fi(Pxs<=\u=u) = Pxse(x,s e ), 1 < i(x,s e ) < \X\ \S e \ - 1, 

fi(p X S*\u=u) = H(YS d \U = u)-H(S e \U = u), \X\ \S e \ <i{p Y \xs«) < \X\ \S e \ + 

The first \X\ \S e \ — 1 functions correspond to the marginals of pxS e except one, and the next 
= /l-^l 1^1 1 5 "! functions are indexed by the attack channels py\xs a £ A- Hence, defining 
W = {1, • • • ,L}, there exist L nonnegative numbers a±,--- ,oll summing to 1 and a random 
variable U' £ W such that 

Pxu's e {x,u',s e ) = Pxse\u(x,s e \u u >)a u/ Vx,s e 

Jl(PSPXU'\S"PY\XS») = J\U\(PSPXU\S e PY\XS a ) VPY\XS° A- 

Hence it suffices to consider 

\U\ = L = \X\\S*\ + P , \\ x W sa \ - \, 
as stated in (|3.14p . to achieve the maximum in 

max min Jl{psPxu\S-Py\xs<^)- 

Step 5. For any choice of U, we have 

C L = max min Jl{psPxu\s*Py\xs«) 

PXU\S e ^'PxU\S e \ L ^ D l) PY\XS a ^- A 

> max mm J L (p s Pxu\S-PY\xs a )- 2 \y\—r 

Pxu\se£Pxu\se\.L,Di)PY\xs a £Ai I 

-F 8 * ,„„ \ mm , j \u\KPsPxu\s-Py\xs-) ~ 2\y\ —— 

PXU\S e ^XU\3"(.M> D l)PY\X3^^A l 

( d ) log I 

- J 113 * ,„„ ^ min a j \u\(psPxu\s-Py\xs-) ~ 2\y\ ~r ( G - 7 ) 

PXU\SeeP X U\Se(M,£>l)PY\XS°£-A I 
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where (a) is the definition of Cl; (b) is because (|G.6P holds uniformly for all Pxu\s e an( ^PY\xs a £ A; 
and (c) is a consequence of Caratheodory's theorem in Step 4; and (d) holds because A\ C A. 

The inequality (|G.7|) holds in the limit as \JJ\ — ► oo, hence 

C L >C-2|3^ 

which proves the claim. □ 

H Proof of Proposition 13.101 

Define sets Vyie) = {py € : minj / p(y) > e} and similarly, Vy\x( € ) = {Py\x £ 

7V|X : min. Ej j / p(y|:E) > e}, for any e € [0, 1/|3^|]. In preparation for the proof of the propo- 
sition, we define a log-uniform quantizer <!>/ and present three lemmas. 

Pmf quantization. Given I > \y\, we define e = Z -1 and a pmf quantization mapping 

<&z : Vy(e) — > Vy(e) as follows. Define the log-uniform quantizer Q; : [e, 1] — > with Z 
reproduction levels 

Q l = {e = e l \e«- l > 1 --- ,e 2 %e*}. (H.l) 

and quantization function 

W = l !^~<%<c(*-lK i = l,2,-..,Z. (H - 2) 



Observe that the ratio between adjacent reproduction levels, e e | 1 as Z — > oo. Moreover the 

Inl 
I 



difference between adjacent reproduction levels is upper- bounded by 1 - e £ < e In e" 1 = ^. Both 



notions of precision will be useful in the proof. 
For any p € Vy(e), define 

q(y) = Qi(p(y)), (H.3) 

the sum a = ^2 y q(y), and the pmf p(y) = ^q(y)- Hence 

m = *iP(y) = v?ffi y Mv yG3; - (IL4) 
E y Qivpyy)) 

Lemma H.l For any integer I > |3^| > 2 and p,p E Vy{l~ l ), we have 

i 2 7 

|D(p||p) - D(*,p||*,p)| < 2(|^| + 1)^. 



Proof. Let e = Z . Since p(y) > e, it follows from (pL2|) and (|rL3|) that 

£ € p{y) < q(y) < p{y)- (H.5) 
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Summing (|H.5P over y £ y, we obtain 



(H.6) 



We have 



e e < K1 J < _ = < -e 



p(y) 



& q{y) 



(H.7) 



For each y £ y, we have 



In / In I 

\p(y) - $ip(v)\ < \p(y) - q(y)\ + |?(y) - *ip(y)| < -r + (l - a)$ip{y) < [l + *ip(y)]-r, 



hence 



Multiplying the inequalities in (|H.7p and taking logarithms, we obtain 



log 



$zp(y) 



Similarly, 

Hence 
D( 



log 



p{y) 
®ip{y) 



< — e log e 



log/ 



p(y) 



< —r- > v y- 



/ 

(H.8) 

(H.9) 
(H.10) 



D(*,p||<&,p) = ^p(y)iog^-^#^(y)iog $ ' M//) 



>(y) 
p(y) 



$/p(y) 



y l log w " log i^rJ + Y Piy) ~ * lPiy)) log 



Hence 



\D{p\\p) — D($ip\\<&ip)\ < max log 

y 

log I 



$ip(y) 



p(y) 



+ max 

y 



log 



$ip(y) 



p{y) 



+ ||p — &ip\\ max 

y 



log 



$zp(y) 



< 2^p + (|y| + l)^bgZ 



,logZ 

' / 



1 + (l^| + l)^ log/ 



< 2(|^| + 1) 



log 2 / 



where the second inequality follows from (1H.8I) . (|H.9[) . and (|H.10|) . and the third inequality from 
the fact that / > |^| > 2. □ 

The following lemma establishes a bound on the variation in conditional Kullback-Leibler di- 
vergence D(Py\x\\Py\x\px) when the mapping is applied to each pmf Py\x{'\ x ) € Py(e). The 
resulting pmf is denoted by <&iPy\x = Py\x('\ x )> x £ 
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Lemma H.2 For any integer I > \y\ > 2, Pxy = PxPy\x £ 'Pxy, and Py\x?Py\x £ ?V|x( e ); we 
have 

\D{py\x\\py\x\px) - D(^ip Y \x\\^lPY\x\px)\ < 2(\y\ + 1)-^— • 

Proof. 

\D(py\x\\Py\x\px) - D(^iPy\x\\^iPy\x\px)\ 

= ^2px(x)[D(py\X=x\\PY\X=x) ~ D{^iPy\X=x\\^IPY\X=x)} 

x 

< m<Ui\D(pY\X=x\\PY\X=x) - D($ip Y \X=x\\®lPY\X=x)\ 

x 1 

< 2( OT + l)^ 

where the last inequality follows from Lemma lH.ll □ 

Proof of Proposition 13.101 

The lower bound is straightforward. We now derive the upper bound. The class of attack 
channels under expected distortion constraint D 2 is denoted by A(D 2 ); the dependency on px is 
not explicitly indicated. 

Define Ul = {1, • • • , L}, where L is given in (|3.15p . Let IA have arbitrarily cardinality, possibly 
larger than L. Define the shorthands 

•A = T ? YS a s d \xs E an d -Au = 7 ? YS a s d \xus e 
and the functionals (with a little abuse of notation) 

E r,\U\ (Ps e , PXU\S" > PYS a S d \XUS" > PY\XS a ) 

- D(p S epxu\SePYS a S d \XUsA\PSPxU\S E PY\XS a ) + \ J\U\(.PS e PxU\SePYS a S d \XUSe) ~ R \ + 



and 



E r \u\(pse,Bu,13) — max min min 

Pxv\s^ePxu\s«iM,Di) p Ysasd . xuse eBu Py\xs»^B 



Hence 



E r,\U\iPS^PXU\S^PYS-S d \XUS^PY\XS a ), £ Aa, B C .A(H.ll) 



E r,\u\{D2) = vaxuE r \ U \{p s& ,Au,A{D 2 )) 
£ r (L> 2 ) = lim E rM (D 2 ). 

\U\^co 



Let e = l/l and 



Z?2 = ^2(1 + elne) < D 2 (H.12) 
Z?2 = L» 2 -e 13^11 |5 a | 

= D 2 -e(\y\\ \S a \ \S d \D + D 2 lne- 1 ). (H.13) 

The proof consists of the following steps: 



46 



1. (Pmf lifting step). Define the subsets 

A(D 2 ;e) = \p Y \xs°< ^ A{D 2 ) : min p Y \xs-iv\x, s a ) > e \ 

Mt) - \ PYS«s d \xse e A ■ minpy 5o5 d| X5e (y,s a ,s d |x,s e ) > e 
^ 1 !/,a;,s 

of «4.(.D2) and A in which the conditional pmf's py\xs a an d P Y S a S d \XS e are lower-bounded 
by e. Also define Au(e) as the set of all pmf's PYS a s d \xus e whose conditional marginals 
P~YS a S d \XS e ,U=u are m A(e) for all u €U. 

For any p S e, p XU \s- G ^XE/|S e (M, -Di), PYS°S d \XUS* G A/ and Py|XS" G ^(^2)5 we show 
there exist PYs a s d \xus e G •^ i w( e ) an d Py\xs a e ^(-^2! e ) suc ^ that 



E r,\U\ {PS e ; PXC/|5 e 1 PYS a S d \XUS e : Py |X,S a ) 

> Er,\U\(PS°,PXU\S*,P Y S*S<l\XUS°> Py\XS")\ ~ yP>l log 2 | <ga | 5/4 |^ d | 1/ 4 cl/4 

(H.14) 

where the constant c was defined in the statement of the proposition. 

2. (Pmf quantization step). We define finite nets Ai(e) C A(e) and ^(I?2;e) C A(D2',e) whose 
cardinalities are at most ^ 5 and Z'-^ ^ I 5 "', respectively. Also define Aiu(t) as the set of 
all pmf's P~YS a S d \xus e whose conditional marginals PYS a S d \xs e ,u=u are m A(e) for all u £U. 

For any p S e, p X u\S e i P' Y s a s d \XUS e £ A/(e) and p' Y \xs a £ A(D' 2 ;e), we show there exist 
PYS a s d \xus E G -Ai,u{e) &ndp Y \xs a 6 ^(-D 2 ;e) such that 

^v,|tt| fe e , Pxc/|5 e 1 Pys»s<*|xt/s e ' p'y \xs°) 
> s r,|W|(P5«,Pxa|s=)Pys a 5 d ]xf7S=) Pr|xs«) - 5|^| \S a \ \S d \ (H.15) 



From (UTTil) and dHl5|l . for 

I > exp 2 



1 + 



■^log(8|5 a | 5 |5 d |c) 



(H.16) 



we obtain 



E r,\U\ (PS S , PXU\S* > PYS a S d \XUS e i Py ) 

> ^iwife^Pxc/is^Pys^dixTO^ Py|xs<0 - A (0 ( H - 17 ) 



where 

dt log 2 / 



A(l) ^7\y\\S a \\S a \-^. (H.18) 



47 



3. By application of Caratheodory's theorem, we show that for each p$e, the supremum of the 
function E r) \ u \(ps e ,Pxu\S e ^lju( € )^( D ^ e )) over Pxu\S e is achieved for \U\ = L given in 
(|515|> . 

4. Combining the results above, we show that E^l^D'^) > E r \ydD2) — Taking the limit 
as |W| — > oo proves the claim. 



Step 1. (Pmf lifting). Denote by 



1 - - « 1 



= and // S a S d(s a ,s) 



|y| — rWK* ,~ | 5a || 5d | 

the uniform distributions over 3^ and iS a x <S rf , respectively. The average distortion for the hypo- 
thetical attack channel py\xs a = satisfies 

Ed(X, Y) < maxEd(x, Y) = max r— V d(x, y) = D. (H.19) 

x x \y\ ^-^ 

y 

To each py\xs a € -4.(1) 2 ') and i>YS a S d \XUS e G .A, associate the conditional pmf's 

P Y \xs*(y\^ n = (1 - e M |5 a | 15^1)^1X5.(^1^, s a ) + e |J>| |S a | \S d \ fi Y (y) 
p YS a S d\ XUS e(y, s a , s d \x,u, s e ) = (1 -e\y\ \S a \ |5 a! |)py 5 a 5 d| XC/5e (y,s a ,s d |x,u,s e ) 

+e|^||5 a ||5 d |/iy(2/)^ 505d ( S a , S d ), Vy,x,n,s (H.20) 

which are slight modifications of Py\xs a an d PYS a s d \xus e and are lower-bounded by e. We refer to 
the mappings in (|H.20p as "pmf lifting" . 

Since average distortion is a linear functional of the attack channel, the average distortion for 
p' Y \xs a 1S u PP er bounded by 

(1 - e \y\ \S a \ \S d \)D% + e \y\ |5 a | \S d \ D < D' 2 . 

Hence p' Y \xs a G A(D' 2 ; e )- 

Since E r ^{-') takes the form D(-) + \J\u\{~) — R\ + , the variation in the error exponent due to 
the above pmf lifting operations satisfies the upper bound: 

Er,L(PS E iPxU\S^ PYS a S d \XUS"i Py\XS°) ~ E r,L(PS £ ,PxiI\S e i PYS a S d \XUS^ PY\XS a ) 

<AD + AJ m (H.21) 

where 

^J\u\ = J\u\{PsePxu\sep~Ys a s d \xuse) - J \U\{PS^PxU\S^PYS a S d \XUS £ ) 

and 

AD = D(p S epxu\Se PYS a S d \XUS"' \\PSPXU\S-- PY\XS a ) - D (PSePXU\Se PYS a S d \XUS* \\PSPXU\S* PY\XS°<) 

= D{p' YS a S d\xus^\Ps a s d \s e PY\xs a \Ps e Pxu\s e ) - D (PYS a s d \xus e \\Ps a s d \s e PY\xs a \Ps s Pxu\s e ) 
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The last equality follows from the chain rule for Kullback-Leibler divergence, 

D(psePxu\sePY s»s d \xuse \\PSPxu\S" Py\xs a ) 

= D{p S e\\p S e) + D{j) YS a S d\xusA\Ps a S d \S e PY\XS a \Ps e Pxu\s £ )- 

The effect of pmf lifting on AD is as follows. By convexity of conditional Kullback-Leibler 
divergence |16j . from (|H.20p we have 

D (pYS a s d \xuse \\Ps a S d \s<> Py\xs a I PS b Pxu\S b ) 
< (1 - t\y\ \S a \ \S d \)D(p YS a S d\ xus 4p S a S d\ Se p Y \xs" \Ps-Pxu\s-) 
+e\y\ \S a \ \S d \D(nY Vs«s d \\Ps*s d \se \ps-Pxu\s-) 



< D{p YS a S d\ XUS e\\p S a S d\ S eP Y \XS" I PS^PXU | 5= ) + C | ^ | ^ \ \S d \ HIBX fog 



fi S a S d{s a ,S d ) 
P S a S d lSe (s a ,S d \s e ) 



1 



< D(p YS a S d lxuS e \\p S a S d \s- Py\xs- I PS'Pxu\s«) + e 1^1 I 5 "! \ s I log |^-| . sd , c 
where the constant c was defined in the statement of the proposition. Hence 



1 



(H.22) 



The effect of pmf lifting on J\ u \{-) = H(YS d )-H(YS d \U)-I(U; S e ) is as follows. From pL20]) . 
we have 



1 YS a S d \XUS e ~ PYS a S d \XUS e \ 



Analogously to (|G,6p . we have 



e\y\ \S a \ \S d \p YS a S d\ XUS e -e\y\ \S a \ \S d \nY Hs*s d \ 



a t I cd i A 



< 2e|^||5 a ||5 



(H.23) 



I A J, 



\u\ 



■^P Y s a s d \xus l 



-H, 



P YS a S d \XUS 



(YS d ) - Hp . (YS d \U) 

\ I P Y S a S d \XUS ey 1 ' 

(YS^+H^ (YS d \U) 

' via adt Yjiae 



< 



YS a S a \XUS' 



V Y s a S d \XUS< 



(YS d ) - H ff (YS 

' ' YS a S d \XUS e 



+ 



< 26 log 



\y\\s d \ 
e 

a i i cd i 



XYS d \U) + H p , (YS d \U) 

r YS a S d \XUS e 



Ae\y\ \S a \ \S d \\o, 



2e\S a \ 



where the last inequality follows from (|H.23p and Lemmas ICj.1I and IG.21 
Combining ppljl . pL22l) . and (lrL24|) . we obtain (iFLTill . 



(H.24) 



49 



Step 2. (Pmf quantization). Consider p' Y \xs a ^ ^D^e). For each value of (x,s a ), apply the 
quantization mapping <&i : Vy(e) — > Py(e) defined in ()H.4p to the pmf p' y | X5a (-|x, s a ) and denote 
by 3>zPyixs<« e ?V|xs»(e) the resulting conditional pmf. Also let 

cr{x,s a ) = ^2Qi{p' Y \xs^y\ x ^ a )) ^ ^ 
y 

where the inequality is obtained as in (|H.6p . Similarly, given p' Y s a s d \xus e ^ -^w( e )i define the 
quantized conditional pmf ^ip' Y s a s d \xus e wmc ^ belongs to the finite set Aiju(e). 
The average distortion associated with <&ip' Y ^ XSa is 

d(x, y) ®i p' Y \xs a (V l x ' s ") Pxs« (x, s a ) 

^— ^2 d(x,y)Q l (p' YlxS a{y\x,s a ))pxs4 x ^ a ) 



y,x,s 



a{x, S a ) 



y,x,s" 



< 



^— — ^2 d{x,y)p YlxS a{y\x,s a )pxs<>(x,s a ) 



a(x, s a ) 



y,x,s" 



= e- e (l+elne)D 2 
< e~ e e elne D 2 
= D 2 . 

Therefore ®ip' Y \ X s<> G Ai(D 2 ;e). 

For any choice of ps*, Pxu\S", P'ys a s d \xus<= G *4«( e ) and Py\xS a G A(D' 2 \ e), we now bound the 
effect of &i on the error exponent as follows: 

E r ,\U\(PS e ,PXU\S^ PYS a S d \XUSz-> PY\XS a ) ~ E r,\U\(PS^PXU\S^^lPYS a S d \XUS s ^^PY\XS a ) 

< AD + AJ ]U] . (H.25) 
AJ\u\ = J\U\{PS? PxU\S e PYS a S d \XUS*) ~ J \u\(PsePxu\s £ ®lPYS a S d \XUSe) 



Here 
and 



I AD I 



D (Ps s PxU\S e PYS a S d \XUS e \\PsPxu\s e Py\xs°) 

-D(psepxu\S^lP Y S a S d \XUS-\\PSPXU\S^lPY\XS^ 

D(pYS a s d \xus e \\Py\xs a Ps a s d \s e \ Ps^Pxu\s e ) 

-D($i PYS"S d \xuse W®i Pv\xs a Ps a s d \s* I Ps e Pxu\se 
log 2 / 



< 2(\y\\s a \\s d \ + i 



< 3\y\\S a \\S' 



(H.26) 
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where the first inequality is obtained by application of Lemma |H. 21 and the second because \y\ > 2. 
Next, we have 

\\PYS a S*\XUS B ) ~ ®lPYS a S d \XUSe\\ - (1^1 I* 5 "! \ S I + 1 ) ~ = d - 
where the inequality is a straightforward generalization of (|H.8|) . Similarly to (|H.24p . we obtain 



|AJ, W || < 20 log 



\y\ \s a 



2(\y\\S a \\S d \ + l) l -^log- ' 



I °\S a \lnl 



< 2\y\ \S a \ \S dl 



log 2 / 



(H.27) 



where the last inequality holds because loge < 1 and |5 a | In / > 1. 



Combining pL25l) . pL26|) and (jHT27j> . we obtain 

\ E r,\U\(Ps e ,PxU\S^ PYS a S d \XUS^ Py\XS") ~ E r,\U\(Ps e , PxU\S"i ®l PYS a S d \XUS^ $f Py\XS")\ 



< \AD\ + |AJ| W || < 5\y\ \S a \ \S 



I 



which establishes ()H.15|) . 

Combining (jH.lip . (|H. 14[) and (|H. 15|) . for any choice of ps e we obtain 

E r ,\u\(Ps^Au, A[pi)) - E r> \ u \(p S e,Ai,u, Ai{D 2 ; e)) 

I 



> -5\y\ \s a \ \s d 



I I 



-A{l) + j\y\\S a \\S d \ 

-A(l) + j\y\\S a \\S d \ 



log 2 I - 2 log 



2 | 5 a|5/4| 5 d|l/4 c l/4 
I 



2 | 5 a|5/4| 5( i|l/4 c l/4_ 
1 



log 2 I - 2 log I + t log(16|5 a | 5 \S d \ c) 



> -A(l) (H.28) 

where A(Z) was defined in (|H.18p , The term in brackets is positive when (|H.16|) is satisfied. 
Step 3. (Caratheodory). Define 

J\U\ fe e , PX\Se,U=u, PYS a S d \XS",U=u) 

= H(YS d ) - H{YS d \U = u)-H{S e ) + H(S e \U = it), Mu&U 

where the quintuple (S e S a S d XY), conditioned on U = u, is distributed as 
PS e Px\s e ,u=uPYS a s d \xs e ,u=u- Likewise, we view the conditional divergence 

D (pYS a S d \XS e ,U=u\\PY\XS a PS a S d \S-' \PX\S £ ,U=uPS £ ) 

as a function of ps?, Px\s e ,u=ui PYS a S d \xs e ,u=ui an d PY\xs a - We may thus write 

D (PYS a S d \XUS e \\PY\XS a Ps a S d \S e \Px\US*PS e ) 

= y^Pu( u ) D (pYS a S d \XS^U=u\\PY\XS a Ps°<S d \Se \PX\S",U=uPS B ) 



J \U\ (Ps e i PXU\S E > PYS a S d \XUS e 



^2pu(u)J\ u \(p S e,Px\Se,U=u,PYS a S d \XSe,U=u) 
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and 



E r,\U\ (Ps e , PxU\S e j f>YS a S d \XUS e > PY \XS a ) 

= D(p S e\\p S e) + ^ j pu{u)D(p YS a S d\ xs ^ u=u \\p Y \xs^Ps a s d \s- \PX\S^U= U PS" 



ueu 



+ 



^2pu(u)J\ U \(PS-,PX\S-,U=u,PYS a S d \XS-,U=u) ~ R 



ueu 



The cardinality of the discretized set Ai(D2', e) of attack channels PY\xs a ls l ess than \ x \ I 5 "'. 
Likewise the cardinality of the discretized set A~i(e) of channels p Y s°-s d \xs e * s ^ ess than ' 5 '- 

We now define the following L functional over Vsxy (recall that S = (S e , S a , S d )): 
Mpsxy\u=u) = PS*(s e )p X \Se(x\s e ), 1 < i{x,s e ) < \X\ \S e \ - 1, 

fi(PSXY\U=u) = D (pYS°-S d \XSe,U=u\\PY\XS a Ps°-S d \Se \PX\S"',U=uPS s ) 

\X\ \S e \ < i(PYS«s d \xscPY\xs«) < \X\ \$ e \ + \Me)\ IAp2;e)| - 1, 

fi(PSXY\U=u) = JL(PS e ,PX\S £ ,U=u,PYS a S d \XS e ,U=u) 

a S d \XS e ) 

< \X\\S e \ + \A~i(z)\ {l + \Ai(D 2 ; e)|) - 1- 
The first \X\ \S e \ — 1 functions correspond to the marginals of pxs e except one, and the next 

1.4,(6)1 (1 + \MD 2 ; e) |)<^IW(l^l) 

functions are indexed by the channels py\xs a ^ Ai(D 2 , e) and PYS a S d \xs e e ^/( e )- Hence, applying 
Caratheodory's theorem, we conclude there exist L nonnegative numbers a\, ■ ■ ■ ,ul summing to 
1 and a random variable U' G Ul such that 

Pswxy (s, u', x, y) = Pxs-\u{x, s e \u u <) a u < P Y s a s d \xus4V: s<1 > s<i \ x , u w, s e ) 

Vs,u',x,y, 

E r ,L (Ps e , PXU'\Se > PYS a S d \XU'S<= > PY\XS a ) = E r,\U\(Ps £ , PXU\S e > PYS a S d \XUS<= > PY\XS a ) > 

VPY\XS<> G -4/(^2! e), PY5-5 d |Xl/5 e e A"( e )- 

Hence, given any p^e, it suffices to consider 

|^| = L=|^||5 e | + /^^KI«5|+|5 a l)_ 1 

to achieve 

JF** „,„ „ n - mi ?^ ^ min - ^,|w| fe e , , Pys°s*\xus° > ) • 
Pxt/|s«ePxc/|s«(|W|,£>i) P y| XS a6A(0 2 ;e) p ygogd|XC7ge6 ^( e ) 

Hence 

^r,|M|(PS«,^(e),A(£>2;e)) = S r , L (p s «, Ai,u L {e),Ai{D 2 -e)). (H.29) 
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Step 4. Let p* se achieve the minimum in 



E r>L (D'l) = min E r>L (p S e , A Ul , A(D' 2 ')) . 



It follows from the previous steps that 

> E r , L {f Se ,Aiju L (e),MD2;e)) ~ A(Z) 
( = ] E rM {p* Se ,Ai,u(e),Ai(D 2 ;e))-A(l) 

(d) 

> E rM (p* S e,Au,A(D 2 ))-A(l) 

> mm E rM (p S e,Au,A(D 2 )) - A(Z) 

= E rM (D 2 )-A(l) (H.30) 

where (a) follows from the definition of p*^ e , (b) follows from (|H.28|) with U = Ul, (c) follows from 
(IH.29D . and (d) holds because Ai t u(e) C Au and Ai(D 2 ;e) C A(D 2 ). 

Equation (|H.3Q[) holds for any U, hence 

E rtL (D' 2 ') > ton E r>m (D 2 ) - A(l) 

= E r (D 2 )-A(l), 

which concludes the proof. □ 
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